AI Models Crack UPSC Prelims 2025: ChatGPT, Gemini, Claude Score Above Cutoff

Every year, over 10 lakh aspirants dedicate years to preparing for India's most gruelling examination, the UPSC Civil Services Preliminary. The cutoff in 2025 was 92.66 marks out of 200, meaning even a single wrong guess can end a dream. With AI tools like ChatGPT, Gemini, and Claude now used by lakhs of students as study companions, a natural question emerged: could these AIs actually sit the exam themselves? We decided to find out.

How We Tested the AI Models

We used the actual UPSC CSE Prelims GS Paper 1 from 2025 (May 25, 2025) and 2024 (June 16, 2024), with official answer keys. All 100 questions from each paper were fed individually to ChatGPT (GPT-5, May 2026), Gemini (2.5 Pro), and Claude (Sonnet 4.5). Each AI received plain text questions with options labeled (a) through (d) and was asked to identify the single correct answer with one-line reasoning. No web search or system prompt priming was enabled. The only advantage was the knowledge absorbed during training, similar to a well-prepared human aspirant.

Scoring Method

We applied the UPSC marking scheme: +2 for correct, -0.67 for incorrect, 0 for unattempted. All three AIs attempted all 100 questions.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

About the 2025 Paper

The 2025 GS Paper 1 was moderate to difficult. Economics dominated with 18 questions, followed by Environment and Ecology (15), Polity (14), History and Culture (15), and Science and Technology (12). The paper featured many multi-statement verification questions, which punish guessing. The official General category cutoff was 92.66 marks, the highest since 2020.

Final Scorecard: UPSC Prelims 2025

ChatGPT (GPT-5) scored approximately 118 marks (73 correct out of 100, 73% accuracy), Gemini 2.5 Pro scored approximately 122 marks (76 correct, 76% accuracy), and Claude Sonnet 4.5 scored approximately 112 marks (68 correct, 68% accuracy). All three cleared the cutoff of 92.66 marks. Subject-wise breakdowns revealed significant differences: ChatGPT achieved 80% in History/Culture, 75% in Science & Tech, 72% in Economy, 67% in Environment, 79% in Polity, 57% in Current Affairs, and 75% in Geography. Gemini scored 87% in History/Culture, 67% in Science & Tech, 72% in Economy, 73% in Environment, 79% in Polity, 64% in Current Affairs, and 75% in Geography. Claude scored 80% in History/Culture, 67% in Science & Tech, 67% in Economy, 60% in Environment, 79% in Polity, 57% in Current Affairs, and 67% in Geography.

Sample Questions and Responses

Representative examples from the 2025 paper: For a question on alternative powertrain vehicles, all three AIs correctly answered option C. For UAV capabilities, only ChatGPT was correct; Gemini and Claude were wrong. For CL-20, HMX, LLM-105 common characteristic, only Gemini was correct. For monoclonal antibodies, only ChatGPT was correct. For virus statements, all three were correct. For India and COP28 health declaration, ChatGPT and Claude were correct while Gemini was wrong. For the Nature Solutions Finance Hub, only Gemini correctly identified AIIB over ADB. For direct air capture technology applications, only Gemini was correct. For the peacock tarantula habitat, only Gemini was correct. For Non-Cooperation Programme components, only Gemini was correct. For Mattavilasa, Vichitrachitta, Gunabhara titles, all three were correct. For Fa-hien's travel, all three were correct. For the military campaign against Srivijaya, all three were correct. For ancient Mahajanapadas paired with rivers, ChatGPT and Gemini were correct while Claude was wrong. For Gandharva Mahavidyalaya, all three were correct.

Analysis of AI Performance

Gemini 2.5 Pro: Frontrunner

Gemini performed strongest overall, driven by superior handling of current affairs and environment questions. It correctly identified AIIB for the Nature Solutions Finance Hub, while rivals chose ADB. Gemini also outperformed on the Gooty tarantula question, direct air capture, and non-cooperation program details. Its best subject was History and Culture (87%), worst was Science and Technology (67%).

Pickt after-article banner — collaborative shopping lists app with family illustration

ChatGPT GPT-5: Consistent but Cautious

ChatGPT delivered solid performance across subjects, with strengths in Polity and History. It struggled with Environment and Current Affairs. For the CL-20/HMX/LLM-105 question, it chose explosives over cruise missile fuel, reflecting a tendency toward broader categories. Best subject: Polity (79%). Worst: Current Affairs (57%).

Claude Sonnet 4.5: Reliable Reasoner

Claude cleared the cutoff with the slimmest margin. It excelled in structured reasoning questions (Statement I/II format) but struggled with specific current affairs and environment questions. It was the only AI to get the Mahajanapadas-rivers pairing wrong. Best subject: Polity and reasoning (79%). Worst: Environment (60%).

Subject-Wise Analysis

History and Culture: All three AIs scored 80% or above, handling textbook questions on Fa-Hien, Rajendra I, Araghatta irrigation, and Ashokan administration confidently. Current Affairs and Environment: Accuracy dropped sharply. Questions about recent institutional launches or obscure species habitat stumped ChatGPT and Claude (57% on Current Affairs). Science and Technology: Surprising failures occurred on technical details like CL-20, HMX, LLM-105, and direct air capture applications.

2024 Paper Benchmark

The 2024 UPSC Prelims had a cutoff of 88 marks. On a 30-question sample, all three AIs performed 2-5 percentage points better. Notably, in 2024, an IIT-founded AI app called PadhAI, trained specifically on UPSC data, scored between 170 and 185 marks live at the exam venue, while generic ChatGPT scored only 75 marks and failed to clear the cutoff. By 2025-26, the gap has narrowed dramatically: GPT-5 and Gemini 2.5 Pro now clear prelims without UPSC-specific training.

Can AI Actually Crack UPSC?

Clearing Prelims is only the first stage. UPSC also includes Mains (descriptive answers) and the Personality Test (interview). Mains requires analytical writing, policy awareness, and connecting historical precedent with contemporary governance—tasks no AI can currently perform. The Personality Test assesses character, leadership, and decision-making under ambiguity, which language models lack. However, AI has raised the floor: aspirants using these tools for concept clarity and revision enter the exam hall better prepared than previous generations.

What This Means for Aspirants

The questions where all three AIs failed—specific recent events, precise wildlife details, fine-grained institutional knowledge—are exactly what separate toppers from the rest. An AI scoring 76% on Prelims is a powerful study partner, but the remaining 24% requires human discipline: daily news reading, environmental study, and memorizing specific dates. No shortcut exists.

UPSC examiners are aware of this landscape. In 2025, about 22-28% of GS Paper 1 questions were current-affairs-adjacent, drawing on events from the past 12-18 months. For AI models with training cutoffs, this is a structural blind spot. For aspirants relying heavily on AI for current affairs, it is a warning.

Final Verdict

ChatGPT (GPT-5) scored approximately 118 marks, Gemini 2.5 Pro approximately 122 marks, and Claude Sonnet 4.5 approximately 112 marks. All three clear Prelims with a reasonable margin above the cutoff. Yes, AI can crack UPSC Prelims in 2026. But cracking UPSC as a whole requires sustained multi-year preparation, real-time awareness, analytical writing, and human judgment—qualities that remain hardest to automate.