Tech and AIMeta exec denies the company artificially boosted Llama 4's...

Meta exec denies the company artificially boosted Llama 4’s benchmark scores

-


A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the models’ weaknesses.

The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that it’s “simply not true” that Meta trained its Llama 4 Maverick and Llama 4 Scout models on “test sets.” In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after it’s been trained. Training on a test set could misleadingly inflate a model’s benchmark scores, making the model appear more capable than it actually is.

Over the weekend, an unsubstantiated rumor that Meta artificially boosted its new models’ benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the company’s benchmarking practices.

Reports that Maverick and Scout perform poorly on certain tasks fueled the rumor, as did Meta’s decision to use an experimental, unreleased version of Maverick to achieve better scores on the benchmark LM Arena. Researchers on X have observed stark differences in the behavior of the publicly downloadable Maverick compared with the model hosted on LM Arena. 

Al-Dahle acknowledged that some users are seeing “mixed quality” from Maverick and Scout across the different cloud providers hosting the models.

“Since we dropped the models as soon as they were ready, we expect it’ll take several days for all the public implementations to get dialed in,” Al-Dahle said. “We’ll keep working through our bug fixes and onboarding partners.”



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Mantra Blockchain Launches $108M RWA Fund: Best Tokenized Asset Gems To Buy On The Dip

Layer-1 blockchain, Mantra, has launched a $108,888,888 ecosystem fund aimed at accelerating the growth of startups focused on...

MicroStrategy pauses buys, says MSTR shareholders don’t own its bitcoin

MicroStrategy didn’t buy any BTC last week. Not only that, MSTR shareholders don’t even own the company’s BTC. Source...

Google’s Sec-Gemini v1 Takes on Hackers & Outperforms Rivals by 11%

In a bid to tilt the cybersecurity battlefield in favor of defenders, Google has introduced Sec-Gemini v1, a...

Crypto Markets Suffer $1T Hit as US Tariff War Escalates

Crypto markets are reeling as aggressive U.S. tariffs ignite global economic turmoil, triggering a $1 trillion wipeout and...

Advertisement

What’s Next for ETH Amid the Bearish Trend Sparked by Trump’s Tariff Decisions?

Ethereum briefly plummeted to $1,415 on Monday amidst a broader market bloodbath. The crypto asset has since recovered...

Fake tariff headlines temporarily rally market, then crash continues

Fake headlines that claimed Donald Trump was considering delaying tariffs temporarily boosted the market — before the truth...

Must read

Mantra Blockchain Launches $108M RWA Fund: Best Tokenized Asset Gems To Buy On The Dip

Layer-1 blockchain, Mantra, has launched a $108,888,888 ecosystem...

MicroStrategy pauses buys, says MSTR shareholders don’t own its bitcoin

MicroStrategy didn’t buy any BTC last week. Not...

You might also likeRELATED
Recommended to you