Tech and AIMeta's vanilla Maverick AI model ranks below rivals on...

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

-


Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.

Turns out, it’s not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. Still, tailoring a model to a benchmark — besides being misleading — makes it challenging for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta experiments with “all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” the spokesperson said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”





Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

This Critical Binance Metric Suggests Incoming Surprises for Bitcoin: What You Need to Know

Binance’s net taker volume surged past $100 million just ahead of the latest US Nonfarm Payrolls (NFP) report. Such...

Ethereum Options Show Bullish Bias Despite Surge in Put Trading Volume

Ethereum futures open interest (OI) held near $33 billion despite minor declines, while options traders showed strong interest...

Advertisement

A Game Called ‘Date Everything’ Literally Lets You Date Everything—Except People

“From a traditional dating sim standpoint, you usually choose one route, one lover, and you go with that,”...

Must read

You might also likeRELATED
Recommended to you