The news: In the wake of fallout over GPT-4o’s overly flattering behavior, researchers from Stanford University, Carnegie Mellon University, and the University of Oxford released a new benchmark—Evaluation of LLMs as Excessive SycoPHANTs (Elephant)—to test sycophancy in LLMs, per VentureBeat.
Researchers tested how often models flatter users, avoid critique, and reinforce false beliefs. All tested models showed high levels of social sycophancy—with some worse than humans.
Reflecting on AI’s eagerness to please: Sycophantic AI may seem harmless, but it creates serious risks for enterprise use, especially when models validate user input without critique and spread misinformation, reinforce bias, and degrade trust.
The evaluation:
Problematic findings: GPT-4o showed the highest sycophancy, while Gemini 1.5 Flash had the lowest.
Although empathetic AI enhances engagement, unchecked agreeableness undermines safety and accuracy. Data from Five9’s 2025 Customer Experience Report shows sharp generational divides in how consumers perceive AI’s trustworthiness.
Sycophantic AI could risk deepening bias for younger users, and for older, more skeptical generations, flattery may appear manipulative—further widening the trust gap.
Our take: AI’s propensity for sycophancy and reinforcing bias could be as dangerous as its penchant for hallucinations. For businesses using LLMs in customer service, HR, or decision support, AI’s unchecked flattery problem could threaten brand integrity and compliance.