Tag: Research

Posts

05 April 2026 /

Retrieval-augmented generation (RAG) is the dominant pattern for building LLM applications over private data. You retrieve relevant documents, pass them as context to an LLM, and get a grounded answer. But which LLM should you use? Does the dataset matter? And how do you even measure whether the answers are good?

To answer these questions, I ran 25,600 evaluations: 5 LLMs × 6 datasets × 2 retrieval conditions × 3 random seeds, with every response scored by 10 metrics — automated text similarity, a 3-judge panel, and RAGAS framework scores. The headline finding: automated metrics and the LLM judges disagree on which models are best.

02 April 2026 /

Anthropic’s 2026 report says developers use AI in 60% of their work but can fully delegate only 0-20% of tasks [1]. That’s a massive gap. I dug into the public research to understand why — and the answer isn’t what most people assume.

As someone who uses AI coding agents daily, I assumed the bottleneck was model capability. It’s not. The research points to something more fundamental.