For the Skeptical Educator: A Completely Honest Account of What AI Can and Cannot Do in Your Classroom

This is not written to convince you. You have sat through enough vendor demos dressed up as professional development to know exactly what persuasion sounds like, and you are right to be tired of it. This is written instead for the most useful person in any staffroom conversation about AI: the teacher who has not yet decided, who is still asking the only question that actually matters — does this help my students learn, or does it just look like it does — and who deserves a straight answer even when the straight answer is complicated.

Your Skepticism Is Not the Problem. It Might Be the Most Useful Instinct in the Room.

Teacher skepticism toward AI is frequently described, including by some EdTech companies, as a problem to be managed — resistance to be overcome, a mindset to be shifted. That framing gets the relationship backwards. A profession that has been told, with confidence, that interactive whiteboards would transform learning, that one-to-one tablet programmes would close achievement gaps, and that MOOCs would democratise education has every reason to demand evidence before adopting the next confident claim. Healthy skepticism is not an obstacle to good decision-making in education. It is a precondition for it.

So this piece will not try to talk you out of your skepticism. It will instead try to give it better information to work with — because the actual research on AI in classrooms, when you read it carefully rather than through a vendor's press release, does not support either of the two stories you have probably heard. It does not support "AI will transform learning for every student in every context," and it does not support "AI is uniformly harmful and should be kept out of classrooms." It supports something more specific and, frankly, more useful: AI helps or harms learning depending on exactly how it is designed and exactly how it is used, and the difference between those two outcomes is knowable in advance.

The Study That Should Make Every AI Vendor Nervous: When AI Tutoring Made Students Worse

In 2025, a research team led by Hamsa Bastani and Osbert Bastani at the University of Pennsylvania's Wharton School ran a large field experiment in a real high school, giving nearly a thousand math students access to a GPT-4-powered AI tutor. Their findings were published in the Proceedings of the National Academy of Sciences, one of the most rigorous and widely respected scientific journals in the world, and they are worth taking seriously precisely because they are not flattering to the AI-in-education industry.

The researchers tested two versions of the tool. One mimicked a standard ChatGPT interface with no instructional guardrails — students could ask it anything and it would simply answer. The second was deliberately constrained to behave like a good tutor: instead of providing answers, it gave teacher-designed hints and required students to work through steps themselves. The result was unambiguous. Students using the unguarded version performed better during practice sessions — because the AI was, quite literally, doing the work for them — but when access to AI was removed and they were tested on the same material independently, those same students performed worse than students who had received no AI tutoring at all. They had not learned the material. They had learned to retrieve the AI's answer.

The guardrailed version told a different story. Students using the tutor designed to prompt explanation and reflection, rather than supply answers, did not show this decline. The skill they built during practice sessions held up when the AI was taken away.

Read that finding again, because it is the single most important sentence in the AI-in-education research literature right now: the same underlying AI model produced opposite educational outcomes — measurable harm in one configuration, preserved learning in the other — based entirely on a design decision about whether the tool gave answers or asked questions.

The Study on the Other Side: When AI Tutoring Outperformed Active Learning

Fairness requires presenting the strongest evidence on the other side with equal weight, and it exists. A randomized controlled trial conducted at Harvard by Greg Kestin, Kelly Miller, and colleagues, published in Scientific Reports in 2025, compared a carefully designed AI tutor against in-class active learning — widely considered, prior to this study, the gold standard of evidence-based teaching. Students using the AI tutor in this study learned more in less time and reported higher engagement than students in the active-learning condition.

The crucial detail, easy to miss in a headline but essential to the honest picture, is that this AI tutor was not a generic chatbot dropped into a classroom. It was specifically engineered around research-based pedagogical design — structured to scaffold reasoning, prompt explanation, and sequence difficulty the way an excellent human tutor would, rather than simply answer whatever was asked. The Harvard result and the Wharton result are not contradictory studies disagreeing about whether AI works. They are two halves of the same finding: AI tutoring is not a single intervention with a single effect. It is a category of intervention whose effect depends almost entirely on instructional design.

A separate body of meta-analytic evidence supports this pattern at scale. Earlier syntheses of nearly 40 and then 50 studies on intelligent tutoring systems found that well-designed AI tutoring could bring students close to the learning gains associated with one-on-one human tutoring — the benchmark Benjamin Bloom famously described as the "2 sigma problem" in 1984, the gold standard that conventional classroom instruction has never been able to match at scale. That is a genuinely significant finding. It is also, notably, a finding about well-designed systems specifically, not about AI access in general.

Where AI Genuinely Struggles — Even in Its Best Implementations

Beyond the design-dependent question of guardrails, there are specific limitations in AI tutoring that show up consistently across studies, regardless of how carefully the tool is built. A 2025 comparative study by Zheng and Li found that AI tutoring systems follow comparatively predictable response patterns and struggle to adjust in real time when a student needs a different kind of explanation, a redirection, or a more involved form of scaffolding than the system anticipated. Human tutors, by contrast, can sense confusion that has not yet been articulated, change their entire approach mid-sentence, and draw out an explanation through a kind of instructional conversation that current AI systems still reproduce only at a surface level.

This is not a minor caveat. It describes precisely the kind of teaching — responsive, adaptive, reading a student's face and adjusting before they have to ask for help — that experienced teachers do almost without noticing, and that no current AI system replicates with real depth. The honest conclusion from the literature is not that AI tutoring is equivalent to skilled human teaching. It is that AI tutoring, well designed, can reliably do some of what good teaching does — structured practice, immediate feedback, infinite patience with repetition — while remaining clearly behind on the parts of teaching that depend on reading a human being in the moment.

A separate and equally honest limitation concerns age. Research consistently finds that older students, with more developed metacognitive skills, get more benefit from AI tools than younger ones, who need more help simply learning how to interact with AI productively in the first place. This is a real argument for designing AI tools differently by grade level, not a reason to treat AI as equally appropriate, in equally unstructured forms, for a six-year-old and a sixteen-year-old.

The Single Design Principle That the Evidence Actually Supports

Strip away the marketing language and the contradictory headlines, and the research converges on one design principle with unusual consistency: AI that gives answers tends to harm learning. AI that asks questions, demands explanation, and scaffolds reasoning tends to preserve or improve it. This is not a philosophical preference dressed up as science. It is what the Wharton field experiment found directly, what the Harvard trial's carefully engineered tutor reflects in its construction, and what decades of research on human tutoring effectiveness predicted before any of this AI-specific evidence existed.

This single principle is also the most useful filter available to any skeptical teacher evaluating a new AI tool, and it requires no technical expertise to apply. Before adopting anything, ask one direct question: when a student is stuck, does this tool answer the question, or does it ask the student a better one? If the honest answer is that it answers the question, the evidence suggests caution regardless of how impressive the tool looks in a demonstration. If it is built to ask a better one, the evidence is considerably more encouraging.

Where This Leaves a Genuinely Skeptical Decision

This is precisely the design principle that Cypher, AI Ready School's AI-powered active learning companion, was built around — not as a marketing claim, but as the deliberate architectural choice the Wharton and Harvard research, taken together, points directly toward. Cypher does not default to providing a finished answer when a student is stuck. It is designed to surface a follow-up question, ask a student to justify a step before advancing, and create exactly the kind of structured cognitive friction that the guardrailed tutor in the Wharton study used to prevent learning loss. This is not a claim that Cypher has solved every limitation described above — no current AI system has solved the responsiveness gap that separates AI tutoring from an experienced human teacher reading a confused face in real time. It is a claim that the tool's fundamental design sits on the side of the evidence that holds up, rather than the side that does not.

The honest position for a skeptical educator to take, based on what the research actually shows, is not "AI does not work in classrooms" and it is not "AI transforms learning." It is this: AI tutoring is a powerful but design-dependent intervention, capable of measurable harm in its careless forms and measurable benefit in its careful ones, and the deciding factor is something you, as a classroom professional, are fully equipped to evaluate without anyone's help — does the tool make the student do the thinking, or does it do the thinking for them.

That is a question worth asking of every AI tool that crosses your desk, including any tool built by AI Ready School. Evidence-based skepticism, applied consistently, is the best protection your students have.

The research does not ask you to trust AI. It asks you to evaluate it the way you would evaluate any other instructional method — by whether it makes students capable of doing something on their own that they could not do before. Everything else is noise.

AI Ready School is built around the evidence, not against it — Cypher (a learning companion designed to scaffold reasoning rather than supply answers), Morpheus (an AI teaching agent that keeps the teacher's judgment at the centre), Zion (a governed, school-safe AI suite), NEO (AI Innovation Labs built on hands-on building, not passive use), and Matrix (sovereign AI infrastructure) — designed for educators who ask hard questions before they ask for a demo.

If you have honest questions about what AI can and cannot do in your specific classroom, join our educator Q&A webinar, or reach out at hey@aireadyschool.com or call +91 9100013885.

‍