Language Testing Methods

Qwen3-Max Thinking beats Gemini 3 Pro and GPT-5.2 on Humanity's Last Exam (with search)

On HMMT Feb 25, a rigorous reasoning benchmark, Qwen3-Max-Thinking scored 98.0, edging out Gemini 3 Pro (97.5) and ...

New method helps AI reason like humans without extra training data

A study led by UC Riverside researchers offers a practical fix to one of artificial intelligence's toughest challenges by ...

UC San Diego Today

From Chatbots to Dice Rolls: Researchers Use D&D to Test AI’s Long-term Decision-making Abilities

Indeed D&D’s complex rules, extended campaigns and need for teamwork are an ideal environment to evaluate the long-term ...

GitHub

Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models

Recent breakthroughs in large language models (LLMs) on complex reasoning tasks have been largely driven by Test-Time Scaling (TTS) — a paradigm that enhances reasoning by intensifying inference-time ...

Medical Xpress

New method accelerates resistance testing in urinary tract infections

Researchers at the Technical University of Munich (TUM) have developed a method for diagnosing urinary tract infections that significantly accelerates antibiotic resistance testing in urine. Because ...

Science Daily

New PFAS testing method created

Researchers have discovered a new way to detect per- and polyfluoroalkyl substances (PFAS) in water. This marks an important step forward in creating testing devices that are simpler, more ...

unite

A ‘Zen’ Method to Stop Language Models from Hallucinating

Telling ChatGPT to fact-check a random answer before solving an actual problem makes it think harder, and get the answer right more often – even if the earlier ‘random’ answer has nothing to do with ...

Frontiers

Testing network clustering algorithms with natural language processing

Introduction: We propose a hybrid methodology to evaluate the alignment between structural communities inferred from interaction networks and the linguistic coherence of users' textual production in ...

The PIE News

Keeping English language testing relevant in the AI era

As AI reshapes how we study, work, and communicate, questions are being asked about the future of English language learning and testing. If translation tools and generative text can produce fluent ...

Hartford Courant

Readers speak: We can replace antiquated animal testing with modern methods

With the National Institutes of Health shifting funding toward human-relevant, non-animal science, Connecticut’s leadership in bioscience has a timely opportunity to champion research methods that ...

GitHub

Efficient Test-Time Scaling for Small Vision-Language Models

Our framework consists of two main pipelines: (1) Test-Time Augmentation: Given an input image and text prompt, we apply various transformations to create multiple augmented versions. VLM processes ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results