At Cognizant's AI Lab in Bengaluru, a stark warning was issued to enterprises rushing to integrate artificial intelligence into their core operations. Babak Hodjat, the Chief AI Officer of the Nasdaq-listed IT giant, highlighted a fundamental and unresolved challenge: the inability to fully trust the reasoning capabilities of Large Language Models (LLMs) when deployed at scale.
The Scaling Problem: When AI Reasoning Breaks Down
Hodjat explained that while today's LLMs demonstrate impressive power, they consistently falter when pushed into longer or more complex chains of logical reasoning. This breakdown poses a severe risk for businesses implementing AI across intricate systems like telecom networks, financial platforms, and global supply chains, where decisions can compound over millions of sequential steps.
"We are far from having a single panacea to determine if an AI system's output can be trusted," Hodjat stated, especially as the industry moves towards autonomous, multi-agent AI ecosystems. Research presented by Cognizant indicates that even the most advanced models experience what he termed "catastrophic breakdowns" when executing extended reasoning sequences.
To illustrate, he cited the classic Tower of Hanoi puzzle. While logically simple, LLMs begin to make errors after just a few hundred reasoning steps—a trivial number compared to the demands of real-world enterprise workflows.
Building Trust: Human Loops and Semantic Confidence
To mitigate these risks, Cognizant has proactively embedded multiple human-in-the-loop safeguards into its AI architectures. One mechanism automatically triggers human intervention when the AI's own confidence score falls below a set threshold. Another is designed to identify and escalate unresolved contradictions between different AI agents to human operators for a final decision.
"If I hit a red flag on some agent doing something that is off policy, I stop the agent right there and I can surface it to a human," Hodjat explained.
A core challenge, he noted, is the current lack of reliable tools to measure an LLM's genuine confidence in its answers. In response, Cognizant's research lab developed a novel technique called 'semantic density.' This method repeatedly samples a question across the semantic space to measure how consistent the LLM's answers remain. This allows engineers to assign an absolute confidence score and assess the reliability of AI-generated responses.
From Periodic Audits to Real-Time AI Governance
Hodjat emphasized that trust cannot be established through periodic audits alone. It must be enforced through continuous, real-time governance. Cognizant has consequently built a live governance layer that monitors AI systems for compliance with privacy, security, bias, and regulatory rules. If an AI agent violates a policy—such as potentially leaking sensitive data—it can be halted instantly and handed to a human reviewer.
Interestingly, Cognizant's research suggests that reliability may not come from ever-larger monolithic models. Instead, it improves when numerous simpler, specialized agents collaborate using voting and error-correction mechanisms inspired by telecommunications networks. In a landmark experiment, the company achieved error-free reasoning across one million sequential steps by distributing the task across a million small agents—a scale where single, massive LLMs would inevitably fail.
"We had non-reliable systems forever and we know how to fix that. We know how to error-correct," Hodjat remarked, drawing parallels to engineering principles from other fields.
The Benchmark Illusion and the Path Forward
As AI systems gain greater influence over critical infrastructure, business revenue, and daily life, Hodjat cautioned against conflating strong performance on standardized benchmarks with real-world trustworthiness. "LLMs are astonishingly capable," he concluded, "but unless we build better ways to evaluate them, monitor them, and bring humans into the loop at the right moments, we're taking risks we don't fully understand."
The message from Bengaluru is clear: the race to adopt AI must be matched by an equally vigorous race to implement robust, human-centric oversight and governance frameworks. For Indian enterprises betting big on AI, the trust deficit highlighted by Cognizant's top AI executive is a crucial strategic consideration.