Model Bench Update - Search News

Interesting Engineering on MSN

GPT-5.5 crushes Claude Opus 4.7 in agentic coding with 82.7% terminal-bench score

OpenAI has introduced GPT-5.5, positioning it as its most capable and intuitive model yet, ...

17d

Claude Opus 4.7 hits 92% honesty rate— are we closer than ever to human-like AI with less hallucination? Here’s what Anthropic’s new AI model is capable of

Claude Opus 4.7 benchmarks explained start with a strong data point: 87.6% on SWE-bench Verified. This jump signals real ...

Unite.AI

MiniMax Open Sources M2.7, a Self-Evolving Agent Model

Chinese AI company MiniMax has released the weights for MiniMax M2.7, a 229-billion-parameter Mixture-of-Experts model that participated in its own development cycle – marking what the company calls ...

Morning Overview on MSN

OpenAI launches GPT-Rosalind, a biology-focused model for lab workflows

OpenAI has released GPT-Rosalind, a large language model fine-tuned specifically for life sciences research, marking the ...

9to5Google

Google updates best AI models for coding Android apps, Gemini & GPT 5.4 at the top

The “Android Bench” for ranking AI models used in Android app development has been updated, with OpenAI’s latest model ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results