PewDiePie Claims His Fine-Tuned AI Model Outperforms ChatGPT-4 in Coding Benchmark
PewDiePie: My AI Model Beat ChatGPT in Coding Test

PewDiePie's Bold AI Claim: Fine-Tuned Model Beats ChatGPT in Coding Test

On February 26, 2026, popular YouTuber Felix "PewDiePie" Kjellberg made waves in the artificial intelligence community with a provocative video upload. In a video titled "I Trained My Own AI… It beat ChatGPT," the content creator asserted that an AI model he personally fine-tuned demonstrated superior performance compared to OpenAI's ChatGPT-4 and several other prominent AI systems.

The Viral Announcement and Benchmark Claims

Right at the beginning of his video, PewDiePie declared to his audience, "The deed is done. I can finally return to this channel 'cause I have done what I said I was going to do. I trained my own AI model." He then presented what he described as official benchmark results that supposedly validated his extraordinary claim.

According to PewDiePie's presentation, his customized model allegedly outperformed multiple established AI systems. "I ran the benchmarks, official AI benchmark, and my model outperforms DeepSeek 2.5, way bigger model than mine. Facebook's flagship model, LLaMA 4 Maverick, destroyed. Bang! Most importantly of all, my model outperforms ChatGPT's 4 in like November or something," he explained in the video.

The content quickly gained significant traction across social media platforms, with fans and technology enthusiasts engaging in heated discussions about the legitimacy and seriousness of these comparative claims.

Technical Details: Fine-Tuning Existing Models

As the video progressed, PewDiePie clarified an important technical distinction. He emphasized that he did not develop a completely new artificial intelligence system from the ground up. "I have not created my own AI. I have merely taken an AI model and trained it. It's like stealing a child on the street instead of birthing one myself. It's way more effective that way. Plus, it would cost millions and millions of dollars in infrastructure, which I do not have yet," he elaborated.

The foundation of his project was Qwen 32B, an existing AI model already recognized for its strong coding capabilities. However, PewDiePie wanted to enhance its performance specifically within a particular coding format. "The model that I used was Qwen 32B, which is already amazing at coding, but I needed it to be amazing at coding in this format," he explained.

The Benchmark Test and Format Adjustment

During his testing phase, PewDiePie focused on a specialized benchmark called Ader Polyglot, which evaluates coding proficiency across six different programming languages. According to his analysis, leading AI models did not achieve the expected results on this particular assessment.

He reported that ChatGPT scored 18.2 percent on this test, while the original Qwen 32B model initially achieved only 8 percent. The crucial breakthrough occurred when he switched from the DIFF format to the HOL format within the testing parameters. This technical adjustment reportedly boosted the score to 16 percent.

To simplify this concept for his audience, PewDiePie compared it to drawing a picture and then adding a cloud without needing to redraw the entire composition. His central argument emphasized that specific formats matter significantly, and relatively minor technical modifications can substantially alter performance outcomes.

Context and Broader Implications

This experiment represents PewDiePie's growing engagement with open-source artificial intelligence technologies. Last year, he previously shared that he had successfully hosted large AI models, including OpenAI's GPT-OSS 120B, on his personal computer hardware.

Whether his fine-tuned model genuinely "beat" ChatGPT in a comprehensive, real-world sense remains subject to ongoing debate within the AI community. However, his video has undoubtedly ignited fresh conversations about how artificial intelligence tools are evaluated, compared, and benchmarked across different testing scenarios.

The incident highlights increasing accessibility to AI customization tools and raises questions about standardized evaluation methodologies in the rapidly evolving field of artificial intelligence development and testing.