The AI Consultant That Scored 95% Until Humans Learned It Was AI

When the “AI consultant” aced the test, the real story was human status anxiety

The SAP experiment that made headlines this week is not really about an overhyped AI consultant. It is about how professionals protect their status, even when the work in front of them is solid. In the study, SAP’s Joule for Consultants generated answers to more than 1,000 consulting‑style business requirements, and senior consultants were asked to rate the quality without knowing where the work came from. When they believed the outputs came from junior humans, they scored the work around 95 percent accurate. When told the same outputs came from an AI copilot, many suddenly rated them much lower. The content did not change. Only the label did. (VentureBeat)

A diverse team of consultants sits around a glass table in a modern, sunlit conference room with a city skyline background. They are reviewing a presentation screen labeled

That is status protection in action. The instinct is not “Is this analysis correct?”, but “What does it mean for my role if a machine can do this?” If your identity and billing rate are wrapped around being the person who “knows the answer,” watching an AI match your performance is deeply uncomfortable. It is easier to find fault with the machine than to admit that much of what passes as expertise, especially at the junior and mid levels, is pattern matching and structured writing that a large language model can now do quite well. The drop in scores when “AI” entered the picture says more about human ego than it does about model quality.

There is a second motive lurking in the background, and it is more sympathetic. Some consultants are not just afraid of losing face; they are afraid of clients being burned. AI hallucinations are real, and in high‑stakes domains like health, law, or legislation, a single wrong statement can carry real‑world consequences. If clients end up relying on AI‑generated analysis that was never appropriately reviewed, the fallout will not land on the model vendor. It will land on the firm, the account lead, and the individual consultants in the room. So there is a rational fear that if AI outputs are over‑trusted, catastrophic results are not just possible, they are eventually guaranteed.

This is where a hypothetical “sixth group” becomes revealing. Imagine running the same experiment, but telling a new cohort of reviewers: “This work was generated by an AI assistant and then reviewed and lightly edited by a senior partner.” Nothing else changes in the underlying content. Based on the original 95 percent accuracy and the way people respond to titles, those scores would likely jump back toward 95 to 100 percent in a hurry. The AI label would suddenly feel less threatening, because a senior human’s name is now wrapped around it. In other words, we do not really trust or distrust the work itself. We trust the story about who touched it.

That is why “AI‑assisted, human‑reviewed” is more than a marketing phrase. It is a concrete way to address both core fears at once. First, it creates a new, visible role for humans: reviewer, AI researcher, prompter, and domain‑expert editor. Instead of competing with the model for who writes the first draft, the consultant is accountable for what ships to the client. Second, it builds a clearer safety lane for end clients. Acknowledging AI involvement at least internally forces the firm to design review, sign‑off, and escalation paths that assume hallucinations will happen and must be caught, not hand‑waved away.

In practical terms, that means firms should stop pretending that work is either purely “human” or purely “AI.” They should design workflows where AI generates drafts, analyses, and options, and humans with domain authority are explicitly responsible for verification and judgment. Internally, teams should know exactly where AI was used and who signed off, especially in critical areas like healthcare guidance, legal analysis, or policy recommendations. Externally, firms can decide how much to disclose based on context and regulation, but hiding AI completely is a short‑term tactic that will not survive the first public failure.

The SAP study offers a useful mirror. It shows that many experts will happily bless AI‑level work when they think a junior colleague wrote it, then downgrade it the moment they see the word “AI.” If the consulting world wants to keep its credibility, it cannot simply lean into that bias and pretend the machines are not there. The path forward is owning AI‑assisted, human‑reviewed workflows, building real guardrails around them, and being honest that the value now lies less in typing the first draft and more in knowing what to ask for, how to check it, and when to say no.

Jacob C Mann

Read more about what I have to say about AI on my Substack.

Meet Jacob

IndyCar Freq

Ancestry

Photography

The AI Consultant That Outscored Humans Exposed a Human Problem

When the “AI consultant” aced the test, the real story was human status anxiety

Submit a Comment Cancel reply