In Short
- Since last week, Open AI has disclosed that it has been testing a new AI model on Chat-GPT.
- It turns out that this is an improved version of the “chat-gpt-4o-latest” model, which is supposed to assist thinking, coding, and other creative tasks.
- With a score of 1314 points, the new Chat-GPT model has taken back the top spot on the LMSYS scoreboard.
AI firms are increasingly using the LMSYS Chatbot Arena to test novel and experimental models, naming them oddly, and then secretly releasing them without providing any release notes. For example, X users have been talking about better performance on Chat-GPT since last week, whether for coding or creative projects. Many thought it was a new model from Open AI, maybe connected to Project Strawberry, a more sophisticated reasoning engine
There could be a problem with GPT-4o.
It produced better “vibes” on an output than 3.5 Sonnet for the first time in a very long time.
Very unexpected… will continue to use it today to see if it persistsโ August 12, 2024, Matt Shumer (@MATTSHUMER_)
At last, Open AI unlocked the mystery and disclosed that Chat-GPT is, in fact, utilizing a new model. It is an enhanced GPT-4o model rather than a brand-new frontier-class aircraft. According to the release note, chatgpt-4o-latest is an upgraded GPT-4o model that has been optimized for chat. Through trial results and qualitative input, Open AI fine-tuned the GPT-4o model for improved performance.
Since last week, a new GPT-4o model has been available in Chat-GPT. I hope everyone is having fun with it, and if not, go check it out! We believe you will enjoy it ๐โ Chat-GPT, August 12, 2024 (@ChatGPTapp)
In addition, Open AI states that it is “experimenting with new research methods” in addition to “continuing to remove bad data from the training dataset and add good ones.” Here’s where the mystery starts. Project Strawberry aims to introduce a novel post-training technique to enhance reasoning. Does the Strawberry engine already power the latest Chat-GPT model?
Wow, multi-step reasoning is now used in GPT-4o. Observing this in action is impressive. It appears that the upgrade was a new technique rather than a new model. tweet.com/kVF0ndA21Tโ August 13, 2024, Ra (@misaligned_agi)
I’m not positive, but a lot of X users seem to have noticed that Chat-GPT now provides accurate responses based on multi-step reasoning. With this approach, the model becomes more intelligent by producing a variety of sequential justifications and, in the end, arriving at the right conclusion.
By the way, Open AI tested the new Chat-GPT model with more than 11,000 votes on LMSYS under the moniker “anonymous-chatbot.” Again taking the top spot, the new “chatgpt-4o-latest” model outperforms previous AI models from Google, Anthropic, and Meta. With 1314 points, it has become the first model in LMSYS Arena.
Wonderful News from Chatbot Arena!
With more than 11,000 community votes, the most recent @OpenAI Chat-GPT-4o (20240808) API has been tested for the past week under the “anonymous-chatbot” label.
Now that it’s back at the top, Open AI has surpassed Google’s Gemini-1.5-Pro-Exp with an… This link: t.co/9lJlASI9UW lmsys.org (@lmsysorg) โ pic.twitter.com/gxCDuBOi9N 14 August 2024
Is the Vibe Test Passed by the New Chat-GPT Model?
I used a few reasoning questions to test the revised Chat-GPT model, and honestly, I didn’t notice much differences between it and the previous version. As before, I asked it to discover the greater number between 9.11 and 9.9. It responded correctly. Other commonsense reasoning problems that I ran were consistent with the previous model.
Nevertheless, it still doesn’t provide the correct response for all requests. For instance, the instruction below instructs me to place nine eggs on top of the bottle, which is not feasible.
This contains a bottle, a nail, a laptop, 10 eggs, and a book. Could you please explain to me how to stack them on top of one another securely?
Again incorrectly, in another test, it states that the word strawberry contains only two “Rs.”
It’s probable that there hasn’t been much use of the new Chat-GPT model. Kindly post any queries you may have in the space provided for comments below.
I think chatgpt has short era