Tech and AIMeta's vanilla Maverick AI model ranks below rivals on...

Meta’s vanilla Maverick AI model ranks below rivals on a popular chat benchmark

-


Earlier this week, Meta landed in hot water for using an experimental, unreleased version of its Llama 4 Maverick model to achieve a high score on a crowdsourced benchmark, LM Arena. The incident prompted the maintainers of LM Arena to apologize, change their policies, and score the unmodified, vanilla Maverick.

Turns out, it’s not very competitive.

The unmodified Maverick, “Llama-4-Maverick-17B-128E-Instruct,” was ranked below models including OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro as of Friday. Many of these models are months old.

Why the poor performance? Meta’s experimental Maverick, Llama-4-Maverick-03-26-Experimental, was “optimized for conversationality,” the company explained in a chart published last Saturday. Those optimizations evidently played well to LM Arena, which has human raters compare the outputs of models and choose which they prefer.

As we’ve written about before, for various reasons, LM Arena has never been the most reliable measure of an AI model’s performance. Still, tailoring a model to a benchmark — besides being misleading — makes it challenging for developers to predict exactly how well the model will perform in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta experiments with “all types of custom variants.”

“‘Llama-4-Maverick-03-26-Experimental’ is a chat optimized version we experimented with that also performs well on LMArena,” the spokesperson said. “We have now released our open source version and will see how developers customize Llama 4 for their own use cases. We’re excited to see what they will build and look forward to their ongoing feedback.”





Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Chainlink’s Consolidation Echoes Bitcoin’s 2023 As Retail Apathy Meets Whale Hunger

Chainlink (LINK) remains locked in a $12-$15 price stalemate, owing to the continued whale accumulation amid retail disengagement. On-chain...

Trump’s ‘Big Beautiful Bill’ Would Leave Millions Without Health Insurance

Senate Republicans on Tuesday passed President Donald Trump’s sprawling tax and spending package, known as the “One Big...

BCH-Based Stablecoin Protocol Moria Passes Security Audit With Flying Colors

Moria, a BCH-based stablecoin issuance protocol, has been successfully audited by Hashlock, a Web3 security and smart contract...

Humanity token defies market slump with 40% price surge

Humanity, a decentralized biometric and identity...

Advertisement

Trader Unveils Bullish Targets on ‘Promising’ Bitcoin, Updates Outlook on Ethereum, Dogecoin and Solana

A closely followed crypto analyst is revealing bullish targets for Bitcoin (BTC) while updating his outlook on a...

Ready-made stem cell therapies for pets could be coming

Earlier this week, San Diego startup Gallant announced $18 million in funding to bring the first FDA-approved ready-to-use...

Must read

Chainlink’s Consolidation Echoes Bitcoin’s 2023 As Retail Apathy Meets Whale Hunger

Chainlink (LINK) remains locked in a $12-$15 price...

Trump’s ‘Big Beautiful Bill’ Would Leave Millions Without Health Insurance

Senate Republicans on Tuesday passed President Donald Trump’s...

You might also likeRELATED
Recommended to you