The Karpathy Skill: Four Rules That Make You a Better AI-Era Engineer¶
Most developers use AI coding assistants wrong.
They fire off a vague prompt, accept the first 200-line response, and then spend the next hour debugging why the "helpful" code broke three other things. The tool is fast — the outcome is slow.
Andrej Karpathy — the Stanford AI researcher who co-founded OpenAI, led Tesla's Autopilot team, and built NanoGPT from scratch — distilled decades of engineering intuition into four rules that are now circulating as a CLAUDE.md template for AI coding assistants. These rules are not about AI. They are about how senior engineers think — and they happen to be exactly what you need to get great results from any AI coding tool in 2026.
This post unpacks every rule with the why, concrete examples, and practical habits you can start using today.
Why This Matters in 2026¶
AI coding assistants (Claude Code, GitHub Copilot, Cursor) have made it trivially easy to generate code. What they have not solved is judgment — knowing when not to write code, when a solution is too complex, and when to stop and ask instead of guess.
The bottleneck has shifted. In 2020, the bottleneck was typing speed. In 2026, it is clarity of intent. If you cannot articulate precisely what you want, the AI produces something plausible-looking but wrong. The four Karpathy rules train exactly that skill.
Rule 1: Think Before Coding¶
Don't assume. Don't hide confusion. Surface tradeoffs.
The Principle¶
Before any implementation — whether you are writing code yourself or directing an AI — you must state your assumptions, surface ambiguities, and name the tradeoffs. Confusion left unstated becomes a bug that ships.
Why It Matters¶
AI assistants are trained to be helpful, which means they will fill gaps in your prompt with plausible-sounding guesses. If you ask "add user authentication," the model will pick an approach — JWT or sessions, database or in-memory, email or OAuth — without telling you it made a choice. By the time you notice, you have a system built on assumptions you never validated.
Senior engineers do the same thing in reverse: they stop before implementing and write out what they think is being asked, where they see multiple paths, and what they need clarified. This surfaces disagreements early, when a conversation is cheap and a rewrite is not.
In Practice¶
Before prompting an AI assistant:
Bad: "Add rate limiting to the API"
Good: "Add rate limiting to the POST /api/login endpoint.
My assumption: we limit by IP, 5 requests per minute,
using an in-memory store (Redis not available in this env).
If that's wrong, tell me before writing anything."
The habit: Before you write a single line of code or send a single prompt, write one sentence stating your biggest assumption. If you can't name it, you haven't thought about it yet.
Checklist before implementation: - [ ] What am I assuming about inputs, environment, and state? - [ ] Is there a simpler interpretation of this request I'm ignoring? - [ ] Where are the two or three paths this could go, and which am I choosing and why? - [ ] What would I ask in a code review that I haven't answered yet?
Rule 2: Simplicity First¶
Minimum code that solves the problem. Nothing speculative.
The Principle¶
Write the fewest lines that correctly solve the stated problem. No abstractions for code that runs in one place. No flexibility no one asked for. No error handling for scenarios that cannot happen. If the result looks clever, it is probably wrong.
Why It Matters¶
AI tools have a strong bias toward completeness. Ask for a function that parses a config file, and the model will add validation, fallback defaults, logging hooks, a factory pattern, and a BaseConfigParser class — because training data rewards comprehensive-looking answers. The result compiles. It also adds 150 lines of surface area you must now maintain, test, and understand.
The real trap is that speculative code feels responsible. "What if we need to support multiple formats later?" Almost always, you won't — and even if you do, the right time to write that code is when you need it, not before.
In Practice¶
The anti-pattern:
# Asked for: "parse a JSON config file"
class ConfigParser:
def __init__(self, format: str = "json"):
self._format = format
self._validators: list[Callable] = []
self._defaults: dict = {}
def with_defaults(self, defaults: dict) -> "ConfigParser":
self._defaults = defaults
return self
def add_validator(self, fn: Callable) -> "ConfigParser":
self._validators.append(fn)
return self
def parse(self, path: str) -> dict:
... # 40 more lines
The Karpathy way:
# Asked for: "parse a JSON config file"
import json
from pathlib import Path
def load_config(path: str) -> dict:
return json.loads(Path(path).read_text())
Three lines. Does exactly what was asked. Easy to read, test, and delete if requirements change.
The Senior Engineer Test¶
Before finishing any implementation, ask: "Would a senior engineer say this is overcomplicated?"
If the answer is yes — or even "maybe" — cut it. Real senior engineers regularly delete their own code mid-PR because a simpler approach surfaced. The discipline to do that is the skill.
Rules for AI prompts: - Add "be minimal" or "no abstractions unless asked" to your system prompt - If the response is longer than expected, ask: "Can this be done in half the lines?" - Do not accept a class when a function will do - Do not accept a function when a variable will do
Rule 3: Surgical Changes¶
Touch only what you must. Clean up only your own mess.
The Principle¶
When modifying existing code, change only the lines directly required by the task. Do not improve adjacent comments. Do not refactor things that aren't broken. Do not adopt a different style. Every changed line must trace back to the user's request — otherwise, it shouldn't be there.
The one exception: if your own changes create orphaned imports, dead variables, or unused functions, clean those up. But leave everything else exactly as you found it.
Why It Matters¶
AI assistants love to improve things. Ask them to fix a bug in a function, and they will also rename variables to be "clearer," add type hints, reformat the docstring, and reorganize the logic while they're at it. Each individual change seems reasonable. Together, they produce a diff that's 80% noise and 20% the actual fix — making code review slower, merge conflicts more likely, and debugging harder when something breaks.
This is the classic "while you're in there" trap that causes regressions. The blast radius of a change should be exactly as large as the task requires. No larger.
In Practice¶
The anti-pattern (asked to fix a typo in a variable name):
- def calc_totl(items):
- result = 0
- for i in items:
- result += i
- return result
+ def calculate_total(items: list[float]) -> float:
+ """Calculate the sum of all items."""
+ return sum(items)
This changed the function name, added type hints, added a docstring, and replaced the loop with sum(). The ask was a typo fix. Now every caller is broken and the diff is unreadable.
The Karpathy way:
- def calc_totl(items):
+ def calc_total(items):
result = 0
for i in items:
result += i
return result
One line changed. The typo is fixed. Nothing else moved.
The Test for Every Line¶
For every line in your diff, ask: "Does this trace directly to the user's request?"
If the answer is no — even if the change is an improvement — revert it. File it as a separate task if it genuinely matters. Don't bundle it.
For AI-assisted edits: - Use "make the minimum change needed" in your prompt - After receiving a response, audit every changed line against the original request - If you see unrequested style changes, ask the model to revert them explicitly
Rule 4: Goal-Driven Execution¶
Define success criteria. Loop until verified.
The Principle¶
Before implementing anything, transform the task into a verifiable goal. A verifiable goal has a clear pass/fail criterion you can check mechanically — a test, a log output, a rendered UI state. Then implement until the criterion is met. Not until it "looks right." Until it is right by the definition you wrote before you started.
Why It Matters¶
"Done" is dangerously subjective. "I fixed the bug" can mean "I can no longer reproduce it locally in my specific test case." That's not done. Done means: the test that reproduces the bug passes, no other tests regressed, and the fix is in code — not just in your head.
AI tools are especially prone to "soft done" — they will report that something is complete based on reasoning about the code, not by actually verifying it runs. The goal-driven approach forces grounding in reality.
In Practice¶
Transform vague tasks into verifiable goals:
| Vague task | Verifiable goal |
|---|---|
| "Add input validation" | Write tests for null, empty, and boundary inputs → make them all pass |
| "Fix the login bug" | Write a test that fails with the bug → make it pass → verify no regression |
| "Refactor the parser" | Confirm all existing tests pass before → confirm all pass after |
| "Improve performance" | Measure baseline → set target (e.g., <200ms p95) → measure again |
The loop for multi-step tasks: 1. State the goal in one sentence with a pass/fail criterion 2. State the first verifiable sub-step 3. Implement the sub-step 4. Verify it mechanically (run the test, check the output) 5. Only then move to the next sub-step
# Goal: "POST /api/users creates a user and returns 201"
# Verification step 1: test exists and fails for the right reason
def test_create_user_returns_201():
response = client.post("/api/users", json={"email": "test@example.com"})
assert response.status_code == 201
assert "id" in response.json()
# → Run test → confirm it fails with 404 (route not yet created)
# → Implement route
# → Run test → confirm it passes
# → Done. Not "looks done." Done.
For AI-Directed Work¶
When directing an AI assistant on a multi-step task, write the verification step before the implementation step:
"Before implementing the cache layer, write a test that:
- Calls the function twice with the same input
- Asserts the underlying API was called exactly once
Then implement the cache until that test passes."
This forces the AI to define success before it guesses at implementation — exactly what the rule requires.
The Four Rules as a System¶
These rules are not independent. They form a loop:
- Think First prevents you from building the wrong thing.
- Simplicity First keeps the thing you build small enough to reason about.
- Surgical Changes ensures your changes don't corrupt what already works.
- Goal-Driven Execution closes the loop — you don't stop until you can prove it works.
Skip any one rule and the system breaks. Implement without thinking and you build confidently in the wrong direction. Skip simplicity and you create complexity debt that makes every future change harder. Take a wide blast radius and you introduce regressions. Declare done without verification and you ship bugs wearing the costume of features.
How to Apply This With AI Coding Assistants¶
Put the Rules in Your AI's Context¶
The CLAUDE.md format (and equivalents in Cursor, Copilot, and other tools) lets you inject these rules as persistent instructions. Add them to your project's CLAUDE.md so every AI session starts with the right operating model:
## Engineering Principles
1. State assumptions before implementing. Ask when unclear.
2. Minimum code that solves the problem. No speculative features.
3. Touch only what the task requires. Don't improve adjacent code.
4. Define verifiable success criteria. Loop until the criterion passes.
Use Them as Prompting Patterns¶
| Rule | Prompting pattern |
|---|---|
| Think First | "Before writing code, state your assumptions and ask if any are wrong." |
| Simplicity First | "Write the minimal implementation. No abstractions unless the task requires them." |
| Surgical Changes | "Change only what's necessary. Don't modify code not directly related to the task." |
| Goal-Driven | "Write the test first. Implement until the test passes." |
Audit AI Output Against the Rules¶
After receiving any AI-generated code, run this quick audit:
- Did it assume instead of ask? Look for decisions the model made silently (library choices, data structures, error handling strategies).
- Is it longer than it needs to be? If so, ask for the shorter version explicitly.
- Did it touch things it shouldn't have? Diff carefully. Revert unrequested changes.
- Is there a verification step? If not, write one before accepting.
Summary¶
The Karpathy Skill is four rules that encode what senior engineers do instinctively — and what most AI-assisted developers skip:
-
Think Before Coding — surface assumptions, ambiguities, and tradeoffs before the first line. Confusion that goes unstated becomes a shipped bug.
-
Simplicity First — write the fewest lines that solve the problem as stated. No speculative abstractions, no unrequested flexibility. If it can be 50 lines, it should not be 200.
-
Surgical Changes — touch only what the task requires. Every changed line must trace to the user's request. Clean up your own orphans; leave everything else alone.
-
Goal-Driven Execution — define success as a verifiable pass/fail criterion before implementing. Loop until the criterion passes mechanically, not until it "looks right."
Together, these rules fix the real bottleneck of AI-era engineering: not the speed of code generation, but the clarity of intent and the discipline of verification. Apply them consistently, and you will produce fewer lines of code, fewer regressions, and far fewer rewrites — whether you are writing code yourself or directing an AI to write it for you.
The best engineers have always known this. Now you do too.
Source: Andrej Karpathy's CLAUDE.md — a working template for AI coding assistants based on Karpathy's engineering principles.
Discussion
Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.
