autoresearch (github.com) (via).
Autoresearch
AutoResearch by Andrej Karpathy
AutoResearch is an open-source framework that enables AI agents to autonomously conduct machine learning research without human intervention. Released in March 2026, this ~630-line Python tool implements a closed-loop research system where AI agents design experiments, modify code, train models, evaluate results, and iterate—all automatically.
The system operates on a fixed 5-minute training cycle, running approximately 12 experiments per hour. AI agents propose changes to training code, execute them, and keep only modifications that improve validation performance (measured in bits-per-byte). The framework includes sophisticated error-handling that feeds GPU errors back to the LLM for code revision, creating a self-correcting research loop.
What makes AutoResearch revolutionary is its scope: it gives AI agents freedom to modify arbitrary code rather than just tune predefined hyperparameters. Researchers simply describe research directions in markdown, point the agent at the repository, and the system generates a complete git history of validated improvements overnight. This represents a shift from "vibe coding" (human prompts, AI writes, human reviews) to fully autonomous research where humans set direction and agents execute independently.
Karpathy frames this as part of a broader evolution in AI-assisted development, where the bottleneck shifts from raw model capability to creating effective feedback signals for continuous improvement. The vision extends beyond single-agent optimization to massively collaborative systems where multiple AI agents explore different research directions in parallel—emulating an entire research community rather than a single PhD student.
From Andrej Karpathy:
Been trying to think of ways to leverage this. There have been some amazing examples so far:
Tobi, founder of Shopify made their template system 53% faster (keep in mind this has been around 20 years)
Cheng Lou, made a leap forward in UI engineering with a text measurement algo in pure TS called pretext (I can’t tell if he used autoresearch specifically, but the footprints are there in the repo)
What else is autoresearch going to push forward?