News

Katanemo Labs' new LLM routing framework aligns with human preferences and adapts to new models without retraining.
The new benchmark, called Elephant, makes it easier to spot when AI models are being overly sycophantic—but there’s no current fix.
LMArena, the company behind artificial intelligence testing service Chatbot Arena, has raised $100 million in initialj funding, marking one of the largest seed rounds in the AI sector to date ...
But validity is a central theme, with particular criteria challenging designers to spell out what capability their benchmark is testing and how it relates to the tasks that make up the benchmark.
Chatbot Arena, the crowdsourced AI benchmarking project, is forming a company called Arena Intelligence Inc., reports Bloomberg.
Boston and Paris – April 3, 2025 – Fault-tolerant quantum computing company Alice & Bob today announces its selection as a performer in the U.S. Defense Advanced Research Projects Agency’s (DARPA) ...
Benchmarks can be used to put large language models to the test. Read on for some tips on how to do it right.
Anthropic used Pokémon to benchmark its newest AI model. Yes, really. In a blog post published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the Game Boy classic ...
Ronca goes on to explain that in the course of benchmarking AV1 integration into Android, Meta has developed VCAT (Video Codec Acid Tests), a new tool for benchmarking hardware and software decoders ...
The Innovation Center is committed to an ongoing cycle of designing, refining, and testing new benchmarking methodologies, particularly as we learn from ongoing model tests. This <i>Forefront</i ...