MCP vs Tool Calling vs Skills: The Mental Model Every AI Builder Needs in 2026¶

You're building an AI agent. You've heard the terms thrown around — tool calling, MCP, skills — and nobody has given you a clean mental model for how they fit together. Are they competing approaches? Different names for the same thing? Should you pick one?

Here's the answer in one sentence: they are layers, not alternatives. Tool calling is the primitive. MCP is the protocol. Skills are the playbook. Production agents in 2026 use all three.

The Big Picture: Three Layers of Agent Capability¶

Before we dive in, burn this mental model into your brain:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   WHAT can the model do?   →   TOOL CALLING   (the function)   │
│                                                                 │
│   WHERE do capabilities live?  →   MCP        (the contract)   │
│                                                                 │
│   HOW to do it well?       →   SKILLS      (the playbook)      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Each layer answers a different question. Each layer depends on the one below it. You can't have MCP without tool calling. You can't have skills without both. Let's build this understanding from the ground up.

Part 1: Tool Calling — The Primitive¶

What Is It?¶

Tool calling (also called function calling) is the most fundamental capability in modern LLMs. It lets the model say: "I don't know the current weather — but I know how to ask for it."

Instead of hallucinating an answer, the model emits a structured request — a function name and a set of arguments — and your application executes the function, then feeds the result back into the conversation.

┌──────────┐      1. User asks a question        ┌──────────────────┐
│          │ ─────────────────────────────────►  │                  │
│   User   │                                     │   LLM (Claude,   │
│          │ ◄─────────────────────────────────  │   GPT-4o, etc.)  │
└──────────┘      4. Final answer                │                  │
                                                 └────────┬─────────┘
                                                          │
                                            2. Model emits tool call:
                                            {
                                              "name": "get_weather",
                                              "input": { "city": "NYC" }
                                            }
                                                          │
                                                          ▼
                                                 ┌─────────────────┐
                                                 │  Your App runs  │
                                                 │  get_weather()  │
                                                 │                 │
                                                 │  Returns: 72°F  │
                                                 └─────────────────┘
                                                          │
                                            3. Result fed back to LLM

How It's Built¶

You define tools as JSON schemas. Each schema tells the model: what the function is named, what it does, and what arguments it accepts. The LLM never runs the function itself — it just decides when to call it and what to pass in. Your application does the actual execution.

# Python example with Anthropic SDK
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }
]

response = client.messages.create(
    model="claude-opus-4-7",
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in NYC?"}]
)

The model responds with a tool_use block. You execute it. You feed the result back. The model continues. That's the entire loop.

When to Use Tool Calling¶

Tool calling is the right choice when:

You have 1–10 functions in a single application
You need full control over every tool available to the model
You're building a self-contained app — no external agents, no shared capabilities
The tools are simple, stateless operations: API calls, database lookups, calculations

Think of a customer support bot that can look up orders, check shipping status, and issue refunds. Three tools. One app. Tool calling is perfect.

The limit: when your tool count grows into the dozens, when multiple apps need the same tools, or when you need tools to be discoverable at runtime — that's when you need the next layer.

Part 2: MCP — The Protocol¶

What Is It?¶

MCP (Model Context Protocol) is an open standard, released by Anthropic in late 2024, that defines how AI models discover and call tools hosted on external servers. It is the contract that separates tool definition from tool execution.

Without MCP: your app owns the tools. With MCP: tools live on a server, any MCP client can use them.

                         ┌─────────────────────────────────┐
                         │         MCP Protocol            │
                         │                                 │
    ┌──────────┐         │   client      ↔      server     │         ┌───────────────────┐
    │  Claude  │ ───────►│                                 │───────► │   Tool Server     │
    │  (client)│         │   JSON-RPC 2.0 over stdio/HTTP  │         │  get_weather()    │
    └──────────┘         │                                 │         │  search_web()     │
                         └──────────────────┬──────────────┘         │  query_db()       │
    ┌──────────┐                            │                         └───────────────────┘
    │  Cursor  │ ───────────────────────────┤
    │  (client)│                            │                         ┌───────────────────┐
    └──────────┘                            └──────────────────────── │  Same tools,      │
                                                                      │  every client     │
    ┌──────────┐                                                      └───────────────────┘
    │  Notion  │ ──────────────────────────────────────────────────►
    │  (client)│
    └──────────┘

How It's Built¶

An MCP server is a lightweight process that exposes tools via a standardized protocol. The server handles the list_tools request (discovery) and the call_tool request (execution).

# A minimal MCP server in Python
from mcp.server import Server
from mcp.server.stdio import stdio_server

app = Server("weather-server")

@app.list_tools()
async def list_tools():
    return [Tool(
        name="get_weather",
        description="Get current weather for a city",
        inputSchema={"type": "object", "properties": {"city": {"type": "string"}}}
    )]

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "get_weather":
        city = arguments["city"]
        data = await fetch_weather_api(city)
        return [TextContent(type="text", text=str(data))]

async def main():
    async with stdio_server() as streams:
        await app.run(*streams, app.create_initialization_options())

The key difference from plain tool calling: this server runs independently of your app. Claude, Cursor, your own agent, a teammate's agent — any MCP-compatible client can connect to it and use the same tools. You build once. Everyone benefits.

How Discovery Works¶

When an MCP client connects to a server, it calls tools/list. The server returns all available tools with their schemas. The client presents those tools to the LLM. This happens at runtime — the client doesn't need to know in advance what tools the server offers.

Client boots  →  Connects to MCP server  →  Calls tools/list
             →  Server returns tool schemas
             →  Client injects into LLM context
             →  LLM calls tools as needed
             →  Client proxies calls to server
             →  Server executes and returns results

When to Use MCP¶

MCP is the right choice when:

You want the same tools shared across multiple apps or agents
You're building a cross-team or cross-organization capability (e.g., a company-wide database tool)
You need runtime discoverability — clients connect and discover what's available
You want to decouple the tool implementation from the AI application

A real-world example: your company builds one MCP server exposing tools for Salesforce, Jira, and your internal wiki. Every team's AI agent — whether it's Claude, Cursor, or a custom agent — connects to the same server. You update the tools once. Everyone gets the update instantly.

The limit: MCP defines where capabilities live and how to call them. It doesn't encode expertise — the nuanced, multi-step knowledge of how to use those tools effectively for a complex task. That's what skills are for.

Part 3: Skills — The Playbook¶

What Is It?¶

A skill is encoded expertise. It's a structured bundle that tells an AI agent not just that a capability exists, but exactly how to use it well — what steps to follow, what context to load, what edge cases to handle.

If tool calling answers "what can I do?" and MCP answers "where do the tools live?", skills answer "how do I do this task like an expert?"

skills/
├── docx-creator/
│   ├── SKILL.md          ← The playbook (triggers, steps, rules)
│   ├── templates/        ← Loaded progressively as needed
│   └── scripts/          ← Automation helpers
│
├── pdf-filler/
│   ├── SKILL.md
│   └── examples/
│
└── code-reviewer/
    ├── SKILL.md
    └── style-guide.md    ← Reference loaded on demand

How It's Built¶

A skill is centered on a SKILL.md file. It defines:

Triggers: what user intent activates this skill
Steps: the exact procedure to follow
References: what files or context to load at each stage
Rules: constraints, quality bars, edge cases

# SKILL: Code Reviewer

## Triggers
- User asks to review a PR or code change
- User says "review this" with code attached

## Steps
1. Load `style-guide.md` for project conventions
2. Read the diff or file provided
3. Check for: security issues, logic errors, naming violations
4. For each finding: cite the exact line, explain the issue, suggest the fix
5. Rate overall quality: Approve / Request Changes / Block

## Rules
- Never approve code with SQL injection or XSS risks
- Flag > 200-line functions for refactoring
- Check that tests exist for new public functions

The progressive disclosure model is key: the runtime loads only the SKILL.md first. Additional files (style-guide.md, templates, scripts) are loaded only when the relevant step in the playbook is reached. This keeps context lean and performance high.

User request → Trigger match → Load SKILL.md → Execute step 1
                                              → Load style-guide.md (step 2 needs it)
                                              → Execute step 2
                                              → Execute step 3 (no extra load needed)
                                              → Done

When to Use Skills¶

Skills are the right choice when:

The task is complex and multi-step — not a single tool call but a sequence
The task is repeated often — reviews, document generation, deployments, incident response
You need consistent quality across different sessions and different people
The expertise is non-obvious — it took a human expert time to figure out the right approach

Real examples in 2026: a "deploy to production" skill that checks tests, bumps the version, opens a PR, waits for CI, and promotes the deployment. A "weekly report" skill that queries three data sources, formats a summary, and posts to Slack. A "security audit" skill that runs five different checks in sequence.

The limit: skills don't exist in isolation. They use tools (via tool calling) and pull capabilities from servers (via MCP). They're the top layer, not a replacement.

How All Three Work Together¶

Here's the complete picture of a production agent in 2026:

┌────────────────────────────────────────────────────────────────────┐
│                        Production Agent                            │
│                                                                    │
│   ┌──────────────┐                                                 │
│   │    SKILLS    │  ← Playbooks loaded on trigger                  │
│   │  (the HOW)   │    encode expertise, define workflow            │
│   └──────┬───────┘                                                 │
│          │                                                         │
│          │  calls                                                  │
│          ▼                                                         │
│   ┌──────────────┐     discovers     ┌──────────────────────────┐  │
│   │     MCP      │ ────────────────► │   MCP Servers            │  │
│   │  (the WHERE) │                   │   - Salesforce tools     │  │
│   └──────┬───────┘                   │   - Internal DB tools    │  │
│          │                           │   - GitHub tools         │  │
│          │  exposes                  └──────────────────────────┘  │
│          ▼                                                         │
│   ┌──────────────┐                                                 │
│   │ TOOL CALLING │  ← Primitive function execution                 │
│   │  (the WHAT)  │    JSON schema, model decides when to call      │
│   └──────────────┘                                                 │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

A real workflow: a user asks the agent to "do a security review of this PR." The agent:

Skill triggers — the code-reviewer skill matches the request and loads its playbook
MCP discovery — the skill instructs the agent to pull the PR diff from the GitHub MCP server
Tool calling — the agent calls get_pull_request(pr_number=42) via the GitHub tool
Skill execution — the playbook steps run: check for vulnerabilities, check for test coverage, verify naming conventions
Result — a structured review, formatted exactly as the skill defines it

None of these layers could produce this result alone. Together, they make an expert.

Quick Reference: When to Use What¶

	Tool Calling	MCP	Skills
Mental model	A function	A contract	A playbook
Answers	What can I do?	Where do capabilities live?	How do I do it well?
Best for	Small, controlled setups	Cross-app, scalable systems	Complex, repeatable tasks
Scope	1–10 functions, single app	Multi-app, multi-team	Multi-step expert workflows
Control	Full (you define everything)	Shared (server owns tools)	Procedural (playbook owns steps)
Effort to build	Low	Medium	Medium–High
Effort to reuse	High (copy-paste)	Low (connect and discover)	Low (trigger and load)
2026 use case	Simple chatbots, quick prototypes	Company-wide tool infrastructure	Agent automation, expert tasks

The "When Do I Add Each Layer?" Decision Tree¶

Is your AI feature a simple Q&A or single-step action?
├── YES → Tool Calling only. Define 1-5 functions. Ship it.
└── NO  → Continue...

Do multiple apps or agents need the same capabilities?
├── YES → Add MCP. Build a server. Let clients discover tools at runtime.
└── NO  → Tool Calling in your app is fine. Continue...

Is the task multi-step, repeated, and quality-sensitive?
├── YES → Add a Skill. Encode the expertise in SKILL.md.
└── NO  → You're done. MCP + Tool Calling covers it.

Common Mistakes to Avoid¶

Mistake 1: Using MCP when Tool Calling is enough If you're building one app with five tools, you don't need a server. Add the tools directly. MCP adds operational overhead — only pay that cost when sharing tools across boundaries.

Mistake 2: Expecting MCP to encode expertise MCP tells your agent that search_codebase() exists. It does not tell the agent when to search, what query to construct, or how to interpret the results. That's a skill's job.

Mistake 3: Writing skills without triggers A skill with no trigger is a skill that never runs. Define your triggers precisely — vague triggers cause skills to fire when they shouldn't, polluting the agent's context.

Mistake 4: Treating them as alternatives The most common mistake of all. Tool calling is not a "simpler MCP." MCP is not a "fancier skill." They are layers. Your production agent will have all three.

A Concrete Starting Point for 2026¶

If you're building your first real agent today, here's where to start:

Week 1 — Get tool calling working Pick one capability your agent needs (search, database lookup, send email). Define it as a JSON schema. Wire it into your LLM call. Test the loop.

Week 2 — Extract to MCP when you hit the sharing wall When a second app needs the same tool, extract it to an MCP server. Use the official MCP SDK for Python or TypeScript. Your first app becomes a client.

Week 3 — Write your first skill Pick the most repeated, quality-sensitive task your agent does. Write a SKILL.md. Define the triggers, steps, and references. Measure if output quality improves — it will.

Summary¶

The confusion between tool calling, MCP, and skills exists because people conflate three questions that have three different answers.

Tool calling answers what can the model do — it's the primitive function mechanism, perfect for 1–10 capabilities in a single controlled application.

MCP answers where capabilities live — it's the open protocol that lets you build tool infrastructure once and have every agent, app, and assistant discover and use it at runtime.

Skills answer how to do complex tasks well — they're encoded playbooks that transform raw tool access into expert, repeatable, multi-step workflows.

The mental model is a stack: tools are the atoms, MCP is the infrastructure, skills are the intelligence. Production agents in 2026 don't choose between these layers — they compose all three to build systems that are capable, scalable, and genuinely useful.

Start with tools. Graduate to MCP when you need to share. Add skills when you need to be good at something.

Discussion

Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.