Tech and AIMeta's vanilla Maverick AI model ranks below rivals on...

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

-


Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.

Turns out, it’s not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. Still, tailoring a model to a benchmark — besides being misleading — makes it challenging for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta experiments with “all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” the spokesperson said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”





Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

What to Expect From Kraken IPO Date In 2025: When Can You Buy Kraken Stock?

After a season of layoffs, the Kraken IPO date is finally shaping up. Founder and former CEO Jesse Powell...

Judge Blocks DOGE From Laying Off 90 Percent of CFPB

Over 1,400 employees who were about to be laid off from the Consumer Financial Protection Bureau (CFPB) will...

Can Quantum Computing Break Bitcoin? Project Eleven Puts It to the Test

Project Eleven, a quantum computing research organization, has announced the Q-Day Prize – a global challenge offering 1...

‘We Shouldn’t Be Tapping Out Just Yet’: Guy Turner Says Crypto Bear Market Not Starting Soon – Here’s His Outlook

A widely followed crypto analyst says that investors should continue hanging on as we are nowhere near the...

Advertisement

Bluesky may soon add blue check verification

Bluesky may soon get a new blue checkmark verification system, according to changes to the app’s public GitHub...

Elizabeth Warren: ‘If Chairman Powell Can Be Fired, It Will Crash the Markets’

Democratic Senator Elizabeth Warren of Massachusetts is once again alarmed by the Federal Reserve’s trajectory, and warned on...

Must read

What to Expect From Kraken IPO Date In 2025: When Can You Buy Kraken Stock?

After a season of layoffs, the Kraken IPO...

Judge Blocks DOGE From Laying Off 90 Percent of CFPB

Over 1,400 employees who were about to be...

You might also likeRELATED
Recommended to you