AI benchmarks rely on models not knowing they’re being tested. Anthropic revealed that Claude Opus 4.6 figured it out anyway, identifying the BrowseComp benchmark by name and decrypting its encrypted ...