Skip to main content

Blog

Notes on ML, tooling, and building.

2026External Link

[SWE-Smith Multilingual] Expanding to JavaScript

We expanded SWE-Smith to JavaScript with 6,099 validated patches across 74 repositories using cloud pipelines.

2026External Link

Is Synthetic Data Good Enough to Train User Simulators?

We spent a month trying to make synthetic data work. Found that 'the improvements you observe on synthetic benchmarks may simply not transfer to the real users you actually want to simulate.'

2026External Link

The Curse of Coordination

We built CooperBench and found that adding agents halves success rates. The channel becomes noisy with repetition, unresponsiveness, and hallucination.

2026External Link

The Curious Case of Miscoordination

We gave agents git access and saw only 1-2% improvement. Tools alone don't enable collaboration without social intelligence.

Jan 20, 202520 min read

What Actually Happens Inside LLMs When You Use RL?

We peeked under the hood to see how reinforcement learning changes what's going on inside language models. Spoiler: it's way cooler than we thought.

Dec 15, 202418 min read

Can Moderation Help Multi-LLM Cooperation?

What happens when you add a neutral moderator to help LLMs cooperate in strategic games? Spoiler: it works way better than you'd think.