Since its release, o1's biggest complaint has been that it's "too verbose."



I just wanted to fix a simple bug, and it gave me three background explanations, two solution approaches plus error handling, and even wished me good luck at the end.

I was only looking for a spelling error on line 12, but ended up getting a forced refresher on Python naming conventions.

This is RLHF's fault. Annotators tend to give higher scores to longer responses, thinking more text looks more professional.

So the model desperately piles on "seemingly useful" filler, while the actual core information gets diluted.

Look at Claude next door—it's much more tactful about this, knowing what response length fits what question.

What really hurts is the wallet: o1's output pricing is $60/1M tokens, and for something that could be explained in 100 tokens, it gets padded to 500, multiplying costs by five.

Now you even have to add a special note like "just the code" when asking, and even that doesn't always work.

The model's current state is: extremely high IQ, but zero emotional intelligence—it has no idea when to just shut up.
View Original
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin