There's a modern king of chatbots coming, and it's not ChatGPT

Go Back

If you asked the general public what the best AI model is, there’s a good chance most people would answer ChatGPT. While there will be many players on the scene in 2024, LLM OpenAI is the one that has truly broken through and introduced powerful generative AI to the masses. And so it happens that ChatGPT’s huge language model (LLM), GPT, has consistently been at the top of its competitors, from the introduction of GPT-3.5, through GPT-4, and now GPT-4 Turbo.

However, the tide seems to be turning: this week, Anthropic’s Claude 3 Opus, LLM, overtook GPT-4 for the first time in the Chatbot Arenawhich prompted app creator Nick Dobos to declare: “The king is dead.” If you check the leaderboard At the time of writing, Claude still has an advantage over GPT: Claude 3 Opus has an Arena Elo rating of 1253, while GPT-4-1106-preview has an Arena Elo rating of 1251, closely followed by GPT-4-0125- preview, with a ranking of 1248.

For what it’s worth, Chatbot Arena ranks all three LLMs in first place, but Claude 3 Opus has a slight edge.

Anthropic’s other LLMs are also performing well. Claude 3 Sonnet ranks fifth on the list, just behind Google’s Gemini Pro (both ranking fourth), while Claude 3 Haiku, Anthropic’s less powerful LLM for high-performance computing, ranks just behind but only above GPT-4 version 0613 version 0613 GPT-4.

How Chatbot Arena evaluates LLM

To rank the various LLMs currently available, Chatbot Arena asks users to enter a prompt and rate how two different, unnamed models will react. Users can continue the conversation to evaluate the difference between them until they decide which model they think works better. Users don’t know which models they’re comparing (you can compare Claude with ChatGPT, Gemini with Lama Meta, etc.), which eliminates any brand preference bias.

However, unlike other types of benchmarks, there is no real rubric by which users can evaluate their anonymous models. Users can simply decide for themselves which LLM performs better, based on whatever metrics they care about. As artificial intelligence researcher Simon Willison says in an interview with Ars Technica, most of what makes LLMs work better in the eyes of users is more “vibes” than anything else. If you like the way Claude responds better than ChatGPT, that’s all that matters.

Above all, it is a testament to how powerful these LLMs have become. If you had offered the same test many years ago, you would probably have looked for more standardized data to determine which LLM was stronger, whether it was speed, accuracy, or consistency. Now Claude, ChatGPT and Gemini are getting so good that they are almost interchangeable, at least when it comes to the overall utilize of generative AI.

While it’s impressive that Claude surpassed OpenAI’s LLM for the first time, what’s probably even more impressive is GPT-4 that has survived this long. The LLM itself is already a year venerable, leaving out iterative updates like GPT-4 Turbo, and Claude 3 was released this month. Who knows what will happen when OpenAI introduces GPT-5 which, at least according to one anonymous CEO, reads: “…really good, like materially better.” For now, there are many generative AI models, each of which is almost equally effective.

Chatbot Arena gathered over 400,000 human votes to rank these LLMs. You can try the test yourself and add your vote to the rankings.

Tags: Veetukku Veedu

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

There’s a modern king of chatbots coming, and it’s not ChatGPT

globally

Related Posts

According to Nvidia CEO Jensen Huang, this is the “next wave” of artificial intelligence

Unreal faces and AI brains: the uncanny valley?

Mahindra XUV 3XO Features in Variants with Price Explanation

Leave a Reply Cancel reply

Recommended

Kevin Costner confirms he won’t be returning to ‘Yellowstone’: ‘I just realized I won’t be able to continue’

Sonakshi Sinha’s future husband Zaheer Iqbal poses for photos with Shatrughan Sinha

Junkyard Gem: 1968 Oldsmobile Delta 88 Custom Holiday Sedan

WATCH | Suryakumar Yadav-Rashid Khan shares playful banter during India vs Afghanistan clash; “stop sweeping me”

According to Nvidia CEO Jensen Huang, this is the “next wave” of artificial intelligence

Unreal faces and AI brains: the uncanny valley?

About Us

Category

Site Map