GPT-5 vs GPT-4o: Is the New ChatGPT Actually Better?
I’ve been using the new GPT-5 for the last few days. This will starting today become your default ChatGPT from now on, even if you’re a free user. So, how is it better? Well, the first thing you’ll probably notice is speed. So, this laptop has the previous model GPT-4o loaded up. This has GPT-5.
Speed and Performance Comparisons
How many rings does Saturn have? Go. Oh my god. GPT-5 took 3.1 seconds. This took 6.5. How many distinct iPhones has Apple made? Go. Oh goodness. Okay. 8.3 seconds. 11.2 seconds. How many possible type combinations exist in Pokémon? Not far off. Actually, quite far off. This has massively over complicated it. 5 seconds. 11 seconds. So really, when you’re asking light questions like this, GPT-5 is between 30 to 50% faster than the old model.
But I wouldn’t really say that speed was one of the key issues before. There’s actually a much more important change happening this generation. So if you’re a really big user of ChatGPT and you’re paying the monthly subscription, it’s actually been really freaking confusing to use cuz you open up the website and you have options. You can either pick GPT-4o, o3, o3 Pro, o4 Mini, o4 Mini High, and each one of these models is its own AI that you can talk to, and each one is slightly better than the others at one thing or another. And there’s even more than those when you click here to expand.
One Model to Rule Them All
How on earth is any normal consumer supposed to be able to decide which one to use for which question? I have always wondered if this AI is actually smart enough to be in the top 10% of test takers sitting the most grueling exams on the planet, why can’t it also just know what’s the best model for your specific question and make the choice for you?
Well, now finally GPT-5 can. This model effectively brings together all the different capabilities of the previously separated specialized models into one general model. Which means if you ask it a question that requires no thinking, then as we’ve seen, it will use a very light model that’s going to be even faster than the old one. But it also means that if you then ask a heavier question that would benefit from more thinking, it can now do that thinking without you needing to specifically select a mode designed for it.
Complex Coding Challenges: Tetris and Chess
So let’s try something very complex. Make me a game of Tetris that I can play in canvas. This is going to mean I can actually play it on the website. So, I’m doing this first on the old ChatGPT-4o. This is the best model that you’d have had access to before without paying. Still though, easily one of the coolest use cases of an AI. We’re approaching 200 lines of code here. And okay, that took 57 seconds. And okay, it’s very simple. This is like the most basic possible form of Tetris, but I suppose at the same time, I have just entered one sentence and generated a game.
So now I want to try the exact same thing but using o4 mini high. So this in the previous generation is the model that would have been specifically recommended to you for coding and you can only get this by paying the monthly subscription. Go. Oh bling out. This is significantly faster. It’s done already. What? That was 14 seconds. And the game is actually almost identical to what GPT-4o produced just in quarter of the time.
Okay, let’s try it on five. And the thing that I’m curious to see here is a does it decide that this is something that would benefit from more thinking time and take longer? And if it does, does it then make it better? Go. I’d say it’s coding slower than o4 mini high, but still faster than 4o. Okay, that was actually pretty fast, too. 19 seconds. Interestingly, more lines of code than both of them. So, it seems like it is making a more complex game.
That is like worlds apart. So, it’s the same game, but the pieces now have divides in them. It’s giving me a score. It’s giving me a level. It’s showing me the upcoming piece. There’s controls at the bottom. Like, okay, don’t get me wrong. None of this is something that you couldn’t have achieved using previous ChatGPT models, but it’s just the fact that it had the intuition to bake it all in completely unprompted. That’s quite impressive. Yeah, it’s a vastly better game of Tetris.
Read More:iPhone 15 Screen Repair Cost: Apple vs. Third-Party Shops Compared
While we’re here though, let’s just do something crazy. So, using o4 mini high, make me a playable game of chess where the pieces are all Pokémon with sprites from the internet and make it look premium so that the sprites fill up the squares. Go. Uh, an error occurred while trying to run your game. It’s very cool that you can ask it to fix its own bugs, but in my experience, there’s a fairly low success rate. And enter. Loading this time. Error again. Come on. Third time lucky. Ah, yes. Finally. It doesn’t look terrible. I mean, it has gone out and picked up Pokémon images from like the very first games in the ’90s. And also, it doesn’t work. It’s giving me an error message every single time I touch a piece.
Same thing in GPT-5. Go. Oh, I rate that. Do you see it specifically said thinking longer for a better answer and thinking longer? It absolutely is. This has taken like three times longer than o4. It’s actually a little bit crazy just to what extent it’s going based on my simple command. Like you can see it coding in like themes and ways to evolve these Pokémon and like seemingly an entirely new rule set that themes chess itself to Pokémon. I’m actually so curious what is cooked up here.
Oh wow. Oh my god. That is a world apart. It’s the exact same thing at the Tetris. It’s like an implicit understanding of what makes the game look more premium. So, if I click a piece, okay, it not only works, but it also highlights all of the legally available moves that you can make with that piece. Tells you whose turn it is at the top here. It is actually just leagues ahead of last gen. Right now, at least, GPT-5 is feeling a lot less like a sidekick. That’s useful, but only useful in so much as you have the coding knowledge yourself to be able to fix it.
Addressing Hallucinations and Accuracy
But okay, coding is still obviously a bit of a niche use. The thing that I would say is most useful to the most people for AI is just making sure that it’s right. Hallucination has been a very real issue with AI. This ability of these chat bots to tell you with full confidence and a straight face something that is actually completely incorrect. And let’s be honest, that’s not an issue that’s suddenly going to be fixed in one generational jump.
But they are saying it’s better thanks to a combination of more powerful hardware running this AI, integration of user feedback from the older models as well as just better ways to benchmark AI and therefore better ways to identify places in which it’s lacking. So put all these things together, it should mean ChatGPT-5 doesn’t get things wrong as much.
Let me try one that I know ChatGPT used to struggle with cuz I was asking it this 2 days ago. Give me 10 tech products made by food brands that I can actually buy. So, this is Tphorro. What? McDonald’s XT mobile phone? I’m very skeptical. This is not a thing. McDonald’s did not make a phone with T-Mobile. And then it’s got the KFC gaming console, which you definitely can’t buy. You see what I mean, right, though? It just invents stuff with complete confidence.
The GPT-5 is any better. I’m a little skeptical. Coca-Cola mini fridge. Okay, Oreo made smart speakers. Did they though? No. To be honest, GPT-5 is not doing any better of a job than four was. It is still inventing stuff. So, this still does need work. Here’s another one I know that used to really trip up the older AI. What AI are you? It just gives me gobbledegook. What about here? What AI is this? I mean, a much more humanoriented answer. This is ChatGPT with the GPT-5 model. I specifically asked the question in a slightly obtuse way to see if it could still figure out my intentions and this did right.
Creative Tasks and Image Generation
For me, GPT has been pretty good, but also pretty inconsistent when it comes to creative tasks. So, I want to give both of these something that is extremely challenging to pull off. Make me a website thumbnail for BMathz I Tested Every Star Wars Gadget Ever. Off you go. And it’s pretty clear why this is such a challenge. It’s got to do my face. It’s got to understand the website thumbnail image size. It’s got to decide what out of all these potential Star Wars gadgets is the most clickable to an average user.
Right. So 4o first. Second one’s not bad. It actually has me holding like a Star Wars blaster and a lightsaber. That’s pretty good. I’m just quite cursed. And I don’t think those are products you can buy. They’re just objects from the universe. What have you done? I would actually say that’s worse. It’s a little bit less good composition. The text isn’t as Star Warsy and it’s a square. Website thumbnails have to be 16 by 9.
What if you were planning a 30th birthday party and you want it to be Star Wars themed? How well can these make the invitations addressed to Rick? 7:00 p.m. to 11:00 p.m. Star Wars formal attire and go. Well, this is taking a real long time. That’s actually so good. It’s Vader in a suit. It’s just the right level of sophisticated for a 30-year-old who happens to still be doing a Star Wars party. Maybe you could have cut one of these Star Wars logos, but apart from that, GPT-5’s is actually really bland. It just looks like so much less effort has been put into that than 4o.
So, it’s hard to make a sweeping conclusion based on just a few tests. But one thing’s pretty clear. GPT-5 is not a step up when it comes to image generation.
Writing Flare and Natural Language
But one thing that OpenAI is saying GPT-5 should have for sure is better writing flare, which is actually one of the things I’ve noticed lacking in the older models. They’ve always felt a little bit wooden, kind of like they’re focusing on getting you an answer as opposed to actually thinking about the delivery of the answer.
So, how about we do the old how soon will I be jobless test? Write me a 30s BMathz Tech fail about the Windows phone. 3 2 1 go. Okay, these are actually quite different looking answers. Okay, let’s do the 4o one. And you got to tell me how close this gets to feeling like I’ve made it.
You ever hear the phrase great idea, wrong decade? That was the Windows phone. In 2010, Microsoft launched a slick, fast, super stable phone OS that nobody wanted. Why? No Instagram, no Snapchat, no YouTube app. Even the calculator felt like a cry for help. They tried paying developers to make apps. Didn’t work. They bought Nokia for 7.2 2 billion to boost hardware sales. Still didn’t work. Then they deleted Nokia. By 2017, Windows Phone had just 0.1% market share. Microsoft’s official advice switched to Android or iPhone.
To me, that feels about what you’d expect. I feel like it’s not quite done the best job of picking the parts of that story that were the most important to tell. Also, the calculator on Windows Phone was actually pretty great. So, not at all a reason it failed.
The cool thing about this GPT-5 answer is it not just has the script, but it also has notes for how to film it and what B-roll shots we should be using to show each line. So, this is the Windows Phone, Microsoft’s big attempt to take on iOS and Android. It looked different, it felt different, and it failed spectacularly. Turns out having no apps in 2012 was like opening a restaurant without food. Developers didn’t bother, customers didn’t care, and Microsoft just gave up. By 2017, the dream was dead. And now the only thing it’s good for is being a retro paperwe. Rest in peace, Windows phone. You’ve tiled too close to the sun.

It’s good. I’d say still needs work, but that’s actually a much better starting point. And it’s using analogies in a way that actually makes you think, huh, great point.
Final Thoughts on the Launch
So ultimately, GPT-5 is not better at every single thing, but it’s still one of those tech launches that’s pretty hard to complain about. Like it’s clearly a lot better in a lot of ways. They’ve integrated all the complexity into one simple model. So they’re going to start to phase out the whole this model for coding and this model for reasoning. The monthly subscription is staying the same price. And also you still get access to GPT-5 even if you’re not paying.
There are a couple of asterisks. If you’re not a paying user, then it is capped. You’ll get say 8 to 10 requests per day using the most powerful GPT. And it’s looking like when you go beyond that limit, you will then use a less powerful but still more powerful than last generation GPT-5 Mini.






