Problem Difficulty CodeChef

Figuring out why AIs get flummoxed by some games

While beating an AI at a board game may seem relatively trivial, it can help us identify failure modes of the AI, or ways in which we can improve their training to avoid having them develop these ...

AI can rewrite open source code—but can it rewrite the license, too?

Computer engineers and programmers have long relied on reverse engineering as a way to copy the functionality of a computer ...

Inside OpenAI’s Race to Catch Up to Claude Code

Anthropic, a smaller rival started by OpenAI defectors, has found runaway success with its programming agent, Claude Code.

Decrypt

There's a Benchmark Test That Measures AI 'Bullshit'—Most Models Fail

BullshitBench tests whether AI models can detect nonsensical questions—or if they'll confidently answer them anyway. The ...

Claude Opus 4.6 Discovers 22 Firefox Vulnerabilities in Just Two Weeks

Claude AI discovered 22 Firefox vulnerabilities in two weeks, including 14 high severity flaws, showing how AI speeds up security research.

3don MSN

AI-enabled quantum refinement cracks the code of difficult-to-map proteins

Using a tool to solve a protein's structure, for most researchers in the world of structural biology and computational chemistry, is not unlike using the Rosetta Stone to unlock the secrets of ancient ...

Earth.com

How AI learned a complex coding language nobody taught it

Researchers show AI can learn a rare programming language by correcting its own errors, improving its coding success from 39% to 96%.

3don MSN

AI is getting scary good at finding hidden software bugs - even in decades-old code

AI is getting scary good at finding hidden software bugs - even in decades-old code ...

3don MSN

I tried GPT-5.4, and most answers were really good - but a few had me concerned

I tried GPT-5.4, and most answers were really good - but a few had me concerned ...

Anthropic and OpenAI just exposed SAST's structural blind spot with free tools

Can free AI scanners replace enterprise SAST? Anthropic and OpenAI found 500-plus zero-days pattern-matching tools missed — and both scanners are free.

6 Claude Code Levels Explained : Map Skills from Prompts to Agent Teams

This Claude Code roadmap defines six levels of skill. Flags context rot and suggests resets, shaping more reliable sessions ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results