Skip to content

Blog

Building Effective AI Agents: The Anthropic Playbook

Most teams building AI agents are solving the wrong problem.

They spend months wiring together orchestration frameworks, reflection loops, and multi-agent graphs — before they've verified that the simplest version of their agent actually works. Then they wonder why the system is expensive, slow, and impossible to debug.

Barry Zhang from Anthropic gave a talk that cuts through all of that. The core message was blunt: most teams are building agents too early, and when they do build them, they build them wrong.

Source: Barry Zhang, Anthropic — "How We Build Effective Agents" (YouTube)

AI Agent Application Demo: Putting a Brain Inside Your App

Source code: github.com/pkhamdee/coffee-agent

There's a quiet revolution happening in how we write software. For decades, we've built applications the same way: write a function, call the next function, handle each case with an if statement, repeat. The logic is explicit, deterministic, and completely predictable — a flowchart carved into code.

That model still works. But it has a hard ceiling.

When a user wants to do something that doesn't fit neatly into your flowchart — when they say something ambiguous, change their mind mid-conversation, or combine requests in ways you didn't anticipate — the rigid-logic app breaks down. You end up writing more and more special-case handling until the code becomes unmaintainable.

AI agents flip this model. Instead of programming every decision upfront, you give your application a reasoning engine — a brain — and let it figure out what to do. The application stops being a flowchart and starts being a collaborator.

This post walks through a real, runnable example: a coffee shop ordering chatbot called Coffee Agent. It's a full-stack app built with NestJS, React, LangGraph, and a local LLM running on Ollama. By the time you finish reading, you'll understand exactly what an agent is, why this architecture is powerful, and how to build one yourself.

Agentic AI Architectures: Patterns, Frameworks, and MCP for Enterprise Systems

Most AI tutorials show you how to call an API and get a response. That's not an agent. An agent is a system that perceives, plans, acts, and adapts — autonomously — using tools, memory, and other agents to complete tasks that no single LLM call could handle.

In 2026, agentic AI is the dominant paradigm for building AI into enterprise software. Not chatbots. Not search bars with AI behind them. Full autonomous systems that can research a topic, write code, test it, file a ticket, notify a Slack channel, and self-correct when something goes wrong — without a human in the loop for every step.

This is the definitive guide. We cover every design pattern, every major framework, the Model Context Protocol that is quietly unifying the entire ecosystem, and how to wire all of it into production enterprise systems.

GPU for AI Explained: VRAM, CUDA Cores, Tensor Cores, and Everything In Between

You've heard it countless times: "You need a GPU to train AI models." But why? What is a GPU actually doing that a CPU can't? What are CUDA Cores, Tensor Cores, and VRAM — and why do AI engineers obsess over these numbers?

This guide starts from scratch and builds a complete mental model of GPU hardware for AI. By the end, you'll understand exactly what's happening inside the chip when your model trains — and how to pick the right hardware for the job.

Master Generative AI — Part 2: Working with LLMs

Part 2 of the Master Generative AI: A Step-by-Step Challenge series.

Series Map:


In Part 1 you built the conceptual foundation. Now we get our hands dirty. This part is where theory becomes practice — you'll write code that tokenizes text, queries embeddings, builds a RAG pipeline, and ships your first working chatbot.

MCP vs Tool Calling vs Skills: The Mental Model Every AI Builder Needs in 2026

You're building an AI agent. You've heard the terms thrown around — tool calling, MCP, skills — and nobody has given you a clean mental model for how they fit together. Are they competing approaches? Different names for the same thing? Should you pick one?

Here's the answer in one sentence: they are layers, not alternatives. Tool calling is the primitive. MCP is the protocol. Skills are the playbook. Production agents in 2026 use all three.

DevOps Project Example: From Code Push to Production with GitOps, FluxCD, and Kubernetes

Most DevOps tutorials show you a pipeline diagram. This one shows you a real pipeline, built on a real application, running on real Kubernetes clusters — with every tool, every workflow, and every design decision explained.

This post walks through the complete CI/CD system behind Slotmachine — a real-time multiplayer tournament app — from the moment a developer pushes code to GitHub, through six security and quality gates, all the way to automated deployment on both Nutanix on-premise clusters and AWS EKS. No hand-waving. No "and then magic happens."

The full source code is available in two repositories:

Nutanix Cloud Platform Overview

Most enterprises still run their workloads on a tangle of separate systems — one vendor for compute, another for storage, another for networking, yet another for virtualization. Managing all of that is expensive, slow, and fragile. Nutanix was founded on one radical idea: collapse all of those layers into a single, software-defined platform that runs on commodity hardware and is as simple to operate as a public cloud.

In 2026, Nutanix Cloud Platform (NCP) has grown from that original idea into a comprehensive stack spanning private cloud infrastructure, multi-cloud management, enterprise Kubernetes, database-as-a-service, AI infrastructure, and unified storage — all managed through a single pane of glass.

Building an LLM from Scratch in PyTorch: The Full Lifecycle Cheatsheet

Most LLM tutorials give you one of two things: a high-level diagram with boxes and arrows, or a 10,000-line codebase with no explanation of why each piece exists.

This post is neither. It's a step-by-step lifecycle — 8 phases, each with working PyTorch code, the reasoning behind every decision, and an explicit Do / Don't list that captures the mistakes that cost most beginners weeks of wasted compute.

By the end you'll have built, trained, modernised, scaled, and aligned a language model — the exact same lifecycle that produced every major LLM you've used.

Phase 1: Core Transformer    → the engine
Phase 2: Train a Tiny LLM    → prove the pipeline works
Phase 3: Modernise           → match 2026 architecture
Phase 4: Scale Efficiently   → push past toy datasets
Phase 5: Mixture of Experts  → conditional computation
Phase 6: SFT                 → turn autocomplete into an assistant
Phase 7: Reward Modelling    → teach the model what "good" looks like
Phase 8: RLHF                → optimise for human preference

LLMs and the Transformer Architecture: A Beginner's Complete Guide

You've chatted with ChatGPT. You've asked Claude for help. You've seen GitHub Copilot finish your sentences. But have you ever wondered what is actually happening inside these systems? How does a computer — a machine that ultimately only understands 0s and 1s — produce text that reads like it was written by a thoughtful human?

This guide answers that question from the ground up. No PhD required. We'll start with an analogy a child could follow, then gradually build up to a precise technical understanding of the Transformer architecture that powers every major LLM today.