Autoresearch

AutoResearch by Andrej Karpathy

AutoResearch is an open-source framework that enables AI agents to autonomously conduct machine learning research without human intervention. Released in March 2026, this ~630-line Python tool implements a closed-loop research system where AI agents design experiments, modify code, train models, evaluate results, and iterate—all automatically.

The system operates on a fixed 5-minute training cycle, running approximately 12 experiments per hour. AI agents propose changes to training code, execute them, and keep only modifications that improve validation performance (measured in bits-per-byte). The framework includes sophisticated error-handling that feeds GPU errors back to the LLM for code revision, creating a self-correcting research loop.

What makes AutoResearch revolutionary is its scope: it gives AI agents freedom to modify arbitrary code rather than just tune predefined hyperparameters. Researchers simply describe research directions in markdown, point the agent at the repository, and the system generates a complete git history of validated improvements overnight. This represents a shift from "vibe coding" (human prompts, AI writes, human reviews) to fully autonomous research where humans set direction and agents execute independently.

Karpathy frames this as part of a broader evolution in AI-assisted development, where the bottleneck shifts from raw model capability to creating effective feedback signals for continuous improvement. The vision extends beyond single-agent optimization to massively collaborative systems where multiple AI agents explore different research directions in parallel—emulating an entire research community rather than a single PhD student.


autoresearch (github.com) (via).

From Andrej Karpathy:


I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then:

  • the human iterates on the prompt (.md)

  • the AI agent iterates on the training code (.py)

    The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc.

    github.com/karpathy/autorese…
    Part code, part sci-fi, and a pinch of psychosis :)


Been trying to think of ways to leverage this. There have been some amazing examples so far:

What else is autoresearch going to push forward?