Meta's vanilla Maverick AI model ranks below rivals on a popular chat benchmark

Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.

Turns out, it’s not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

The release version of Llama 4 has been added to LMArena after it was found out they cheated, but you probably didn’t see it because you have to scroll down to 32nd place which is where is ranks pic.twitter.com/A0Bxkdx4LX

— ρ:ɡeσn (@pigeon__s) April 11, 2025

Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. Still, tailoring a model to a benchmark — besides being misleading — makes it challenging for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta experiments with “all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” the spokesperson said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”

Source link

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

LEAVE A REPLY Cancel reply

Latest news

Chainlink’s Consolidation Echoes Bitcoin’s 2023 As Retail Apathy Meets Whale Hunger

Trump’s ‘Big Beautiful Bill’ Would Leave Millions Without Health Insurance

BCH-Based Stablecoin Protocol Moria Passes Security Audit With Flying Colors

Humanity token defies market slump with 40% price surge

Advertisement

Trader Unveils Bullish Targets on ‘Promising’ Bitcoin, Updates Outlook on Ethereum, Dogecoin and Solana

Ready-made stem cell therapies for pets could be coming

Must read

Chainlink’s Consolidation Echoes Bitcoin’s 2023 As Retail Apathy Meets Whale Hunger

Trump’s ‘Big Beautiful Bill’ Would Leave Millions Without Health Insurance

You might also likeRELATED
Recommended to you

Editor Picks

Imagen AI (IMAGE) Developer to Enable Ripple Labs Stablecoin RLUSD for Service Payments

Imagen Network Begins Strategic Expansion with Bitcoin-Funded AI Infrastructure Rollout

Imagen Network Taps Solana to Roll Out AI-Powered Social Features for Decentralized Growth

Must Read

Chainlink’s Consolidation Echoes Bitcoin’s 2023 As Retail Apathy Meets Whale Hunger

Trump’s ‘Big Beautiful Bill’ Would Leave Millions Without Health Insurance

BCH-Based Stablecoin Protocol Moria Passes Security Audit With Flying Colors

Hot Topics

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

LEAVE A REPLY Cancel reply

Latest news

Advertisement

Must read

You might also likeRELATEDRecommended to you

Editor Picks

Must Read

Hot Topics

You might also likeRELATED
Recommended to you