Tech and AIGPT-5 Doesn't Dislike You—It Might Just Need a Benchmark...

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

-


Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence.

Researchers at MIT have proposed a new kind of AI benchmark to measure how AI systems can manipulate and influence their users—in both positive and negative ways—in a move that could perhaps help AI builders avoid similar backlashes in the future while also keeping vulnerable users safe.

Most benchmarks try to gauge intelligence by testing a model’s ability to answer exam questions, solve logical puzzles, or come up with novel answers to knotty math problems. As the psychological impact of AI use becomes more apparent, we may see MIT propose more benchmarks aimed at measuring more subtle aspects of intelligence as well as machine-to-human interactions.

An MIT paper shared with WIRED outlines several measures that the new benchmark will look for, including encouraging healthy social habits in users; spurring them to develop critical thinking and reasoning skills; fostering creativity; and stimulating a sense of purpose. The idea is to encourage the development of AI systems that understand how to discourage users from becoming overly reliant on their outputs or that recognize when someone is addicted to artificial romantic relationships and help them build real ones.

ChatGPT and other chatbots are adept at mimicking engaging human communication, but this can also have surprising and undesirable results. In April, OpenAI tweaked its models to make them less sycophantic, or inclined to go along with everything a user says. Some users appear to spiral into harmful delusional thinking after conversing with chatbots that role play fantastic scenarios. Anthropic has also updated Claude to avoid reinforcing “mania, psychosis, dissociation or loss of attachment with reality.”

The MIT researchers led by Pattie Maes, a professor at the institute’s Media Lab, say they hope that the new benchmark could help AI developers build systems that better understand how to inspire healthier behavior among users. The researchers previously worked with OpenAI on a study that showed users who view ChatGPT as a friend could experience higher emotional dependence and experience “problematic use”.

Valdemar Danry, a researcher at MIT’s Media Lab who worked on this study and helped devise the new benchmark, notes that AI models can sometimes provide valuable emotional support to users. “You can have the smartest reasoning model in the world, but if it’s incapable of delivering this emotional support, which is what many users are likely using these LLMs for, then more reasoning is not necessarily a good thing for that specific task,” he says.

Danry says that a sufficiently smart model should ideally recognize if it is having a negative psychological effect and be optimized for healthier results. “What you want is a model that says ‘I’m here to listen, but maybe you should go and talk to your dad about these issues.’”



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Satoshi likely launched 51% attack on Bitcoin during early days

An in-depth review of 2009 blocks suggests it’s highly likely that Satoshi used hash power to reorganize Bitcoin’s...

Federal Judge: Elon Musk Must Face OpenAI’s ‘Sufficient’ Harassment Claims

In March 2026, a jury will hear OpenAI's claims of a "years-long harassment campaign" by Musk via social...

Standard Chartered Predicts $25K ETH Price | Live Crypto Updates | Aug. 14, 2025

Disclaimer: This article is for informational purposes only and does not constitute financial advice. BitPinas has no commercial...

Advertisement

Norway Crypto Could Be The Market’s Real Sleeping Giant

Norway’s $1.5Tn Government Pension Fund Global, managed by Norges Bank Investment Management (NBIM), now holds indirect exposure to...

Court rules prediction market Kalshi’s US election bets are legal

A judge has rejected the CFTC’s attempt to stop Kalshi from resuming its election prediction markets as the...

Must read

You might also likeRELATED
Recommended to you