AI Agent Application Demo: Putting a Brain Inside Your App¶

Source code: github.com/pkhamdee/coffee-agent

For decades we've built applications the same way: write a function, call the next function, handle each case with an if statement. The logic is explicit, deterministic, and completely predictable — a flowchart carved into code.

That model has a hard ceiling. When a user says something ambiguous, changes their mind mid-conversation, or combines requests in ways you didn't anticipate, the rigid-logic app breaks down. You write more and more special-case handling until the code becomes unmaintainable.

AI agents flip this model. You give your application a reasoning engine — a brain — and let it figure out what to do. This post walks through a real, runnable example: Coffee Agent, a coffee shop ordering chatbot built with NestJS, React, LangGraph, and a local LLM on Ollama.

What You'll Learn¶

By the end of this post you'll be able to:

Explain what an agent is — brain, memory, environment — without hand-waving
Build a one-node LangGraph agent with MongoDB-backed conversation persistence
Inject dynamic state into the system prompt on every turn
Write server-side guardrails that override the LLM when it breaks a business rule
Verify LLM claims against what the user actually said (trust-but-verify)
Unit test an agent application — yes, really

Prerequisites: comfortable with TypeScript and basic NestJS or Express. Zero LangChain or LLM experience required.

What Is an Agent, Really?¶

The word "agent" gets thrown around a lot. Let's be precise.

In traditional software, control flow is hardcoded. A shopping cart app literally has if (item.inCart) { removeFromCart() } else { addToCart() }. The developer made that decision — the code just executes it.

In an agent-based application, you replace some of that hardcoded logic with a language model (LLM) that reasons about what to do next. The LLM reads the current state of the world (the conversation, the data, the context), thinks, and produces a structured response that drives the application forward.

An agent has three core components:

Component	What it is	In Coffee Agent
Brain	The LLM that reasons	Ollama running `qwen2.5:7b` locally
Memory	State that persists across turns	MongoDB checkpoints via LangGraph
Environment	Context and rules the agent operates in	System prompt with menu, order state, and guardrails

The key insight: you don't tell the agent exactly what to do — you tell it what it knows and what it should accomplish, then let it reason its way there.

The Coffee Agent: What It Does¶

Coffee Agent is a conversational barista. You open the app, type "I'd like something with oat milk," and an AI named Maya guides you through placing a coffee order. She asks for your name, confirms your drink, collects customizations one at a time, and when everything is confirmed, saves the order to MongoDB.

The app has two views:

Customer chat — a React UI where you talk to Maya
Barista dashboard — a staff view where pending orders appear and get marked delivered via PATCH /chats/orders/:id/deliver (a soft delete — the order keeps a status field instead of being removed)

What makes the architecture interesting:

Conversation state is fully persistent — close the tab, come back, Maya remembers everything
After a completed order, the same thread starts a fresh order session without losing history
Every customization is drink-aware — an Espresso won't be asked about size or toppings, a Frappuccino will
The LLM runs entirely locally — no cloud API key required
The deterministic logic is extracted into a pure module with Jest unit tests

The Tech Stack¶

Backend (src/)

Package	Version	Purpose
`@nestjs/common`	^10	HTTP framework
`@langchain/core`	^0.3.75	Prompt templates, message types
`@langchain/langgraph`	^0.4.8	Agent orchestration
`@langchain/langgraph-checkpoint-mongodb`	^0.1.1	Conversation persistence
`@langchain/ollama`	^0.2.4	Local LLM provider
`mongoose`	^8.18	MongoDB ODM
`class-validator`	^0.14	Request DTO validation
`zod`	^4.1.8	Schema validation

The dependency list is deliberately lean: no monolithic langchain package, no unused cloud providers. To swap Ollama for OpenAI or Gemini you'd add @langchain/openai or @langchain/google-genai and change one constructor — the rest of the code never touches the provider.

Frontend (client/)

Package	Purpose
React 19 + Vite	Chat UI
Tailwind CSS 4	Styling

Infrastructure

Ollama on localhost:11434 (local LLM)
MongoDB via Docker Compose (mongodb/docker-compose.yml) — stores both orders and LangGraph checkpoints

Architecture: One Node, All the Intelligence¶

Here's the full request lifecycle before we go deep into each piece:

User types a message
        ↓
React → POST /api/chats/message/:threadId
        ↓
NestJS ChatsController → ChatService.chatWithAgent()
        ↓
   ┌──────────────────────────────────────────┐
   │  chatWithAgent() in chats.service.ts     │
   │                                          │
   │  1. Load previous state from the graph   │
   │  2. Find the session start boundary      │
   │  3. Rebuild knownOrder from history      │
   │  4. Invoke the graph → callModel():      │
   │     · compute still-needed fields        │
   │     · build dynamic system prompt        │
   │     · call Ollama (JSON mode)            │
   │  5. Parse response (recover if garbage)  │
   │  6. Accept only values the user said     │
   │  7. Apply server-side guards             │
   │  8. Save order if completed              │
   └──────────────────────────────────────────┘
        ↓
Structured JSON → React updates chat + order sidebar

Notice there are no branches in the graph. The intelligence lives entirely inside one function — and the safety lives in deterministic code around it.

Step 1: The LangGraph Graph¶

src/chats/chats.service.ts defines the agent as a LangGraph state machine, built once in the NestJS constructor and reused for every request:

// src/chats/chats.service.ts
constructor(
  @InjectModel(Order.name) private orderModel: Model<Order>,
  @InjectConnection() private connection: Connection,
) {
  this.llm = new ChatOllama({
    model: process.env.OLLAMA_MODEL || 'qwen2.5:7b',
    temperature: 0,
    baseUrl: (process.env.OLLAMA_BASE_URL || 'http://localhost:11434').replace('/v1', ''),
    format: 'json',
  });

  // Reuse the Mongoose connection's underlying driver client for
  // LangGraph checkpoints instead of opening a second connection.
  const checkpointer = new MongoDBSaver({
    client: this.connection.getClient() as unknown as MongoClient,
    dbName: CHECKPOINT_DB_NAME,
  });

  this.agent = new StateGraph(GraphState)
    .addNode('agent', (state, config) => this.callModel(state, config))
    .addEdge(START, 'agent')
    .addEdge('agent', END)
    .compile({ checkpointer });
}

One node (agent), two edges. That's it. The complexity lives inside callModel, not in the graph topology.

Two details worth copying into your own projects:

Compile the graph once. An early version connected to MongoDB and rebuilt the graph on every request. Moving it to the constructor removed a connection per message.
Share the database connection. The checkpointer rides on the same MongoClient that Mongoose already opened — one connection pool, not two.

The checkpointer serializes the full conversation — every message, in order — to MongoDB after each turn. When the next request arrives, LangGraph rehydrates the exact state and continues where it left off. Conversation persistence comes for free.

Step 2: Session Boundary Detection¶

Here's a design you won't find in most tutorials: the service tracks session boundaries within a single thread.

When a customer finishes one order and starts another in the same conversation, the agent needs to ignore the first order's context. The code finds the boundary by scanning backwards through message history for the last "completed" marker:

// src/chats/chats.service.ts
let sessionStart = 0;
for (let i = prevMessages.length - 1; i >= 0; i--) {
  const m = prevMessages[i];
  if (!(m instanceof AIMessage)) continue;
  try {
    if (extractJsonResponse(m.content)?.progress === 'completed') {
      sessionStart = i + 1;   // New session starts after the last completed order
      break;
    }
  } catch { /* not a structured message */ }
}

Everything before sessionStart is history. Everything from sessionStart forward is the current order in progress. The model only ever sees the current session's messages, so it never confuses order A's oat milk with order B.

Step 3: Rebuilding Order State from History¶

Rather than trusting the LLM to maintain a running order object, the server rebuilds it from scratch every turn by replaying the session's message history.

AI messages supply confirmed field values. Human messages supply explicit declines — because users often say "no milk please" in a way the model might store as null instead of "No Milk":

// src/chats/chats.service.ts
const knownOrder: Record<string, any> = {};
for (const m of prevMessages.slice(sessionStart)) {
  if (m instanceof HumanMessage) {
    const h = String(m.content).toLowerCase();
    for (const [field, pattern, canonical] of NO_DECLINE_PATTERNS) {
      if (h.includes(pattern)) knownOrder[field] = canonical;
    }
  } else if (m instanceof AIMessage) {
    try {
      const co = extractJsonResponse(m.content)?.current_order;
      if (!co) continue;
      for (const [k, v] of Object.entries(co)) {
        if (v === null || v === 'null' || v === undefined) continue;
        const strV = String(v).trim();
        if (!strV) continue;
        // Reject menu vocabulary masquerading as a customer name
        if (k === 'name' && !isValidCustomerName(strV)) continue;
        knownOrder[k] = v;
      }
    } catch { /* skip */ }
  }
}

The decline patterns are a simple lookup table in agent-utils.ts:

// src/chats/agent-utils.ts
// [field, pattern to detect in the user message, canonical stored value]
export const NO_DECLINE_PATTERNS: Array<[string, string, string]> = [
  ['toppings',   'no topping',   'No Toppings'],
  ['milk',       'no milk',      'No Milk'],
  ['syrup',      'no syrup',     'No Syrup'],
  ['sweeteners', 'no sweetener', 'No Sweetener'],
];

The result is knownOrder — an authoritative snapshot of what has been confirmed, independent of what the model might claim.

Step 4: Building the Dynamic System Prompt¶

This is where the brain gets its instructions. Inside callModel, the prompt is rebuilt every turn. First the code computes exactly which fields still need collecting, using each drink's capability flags:

// src/chats/chats.service.ts
const drinkMeta = DRINKS.find(
  (d) => d.name.toLowerCase() === (knownOrder.drink ?? '').toLowerCase(),
);
const stillNeeded: string[] = [];
if (!knownOrder.name) stillNeeded.push("name — greet and ask for the customer's name");
if (!knownOrder.drink) stillNeeded.push('drink — ask which drink');
if (drinkMeta?.supportSize && !knownOrder.size) stillNeeded.push('size');
if (drinkMeta?.supportMilk && !knownOrder.milk) stillNeeded.push('milk');
if (drinkMeta?.supportSyrup && !knownOrder.syrup) stillNeeded.push('syrup');
if (drinkMeta?.supportSweeteners && !knownOrder.sweeteners) stillNeeded.push('sweeteners');
if (drinkMeta?.supportTopping && !knownOrder.toppings) stillNeeded.push('toppings');
if (knownOrder.drink && !knownOrder.quantity) stillNeeded.push('quantity');

The capability flags live in menu_data.ts — the single source of truth for the menu:

// src/lib/utils/constants/menu_data.ts
export const DRINKS: Drink[] = [
  {
    name: 'Espresso',
    description: 'Strong concentrated coffee shot.',
    supportMilk: false, supportSweeteners: true,
    supportSyrup: true, supportTopping: false, supportSize: false,
  },
  {
    name: 'Latte',
    description: 'Espresso with steamed milk, smooth and creamy.',
    supportMilk: true, supportSweeteners: true,
    supportSyrup: true, supportTopping: true, supportSize: true,
  },
  // Cappuccino, Cold Brew, Frappuccino...
];

Then everything — menu, known order, next task, output format — is injected into the system prompt:

// src/chats/chats.service.ts (abridged)
const prompt = ChatPromptTemplate.fromMessages([
  {
    role: 'system',
    content: `
You are Maya, a friendly and enthusiastic barista at a cozy café...

**AVAILABLE DRINKS AND OPTIONS**
${DRINKS.map((drink) => `- ${createDrinkItemSummary(drink)}`).join('\n')}
Sizes: ${createSizesSummary()}
Milks: ${createAvailableMilksSummary()}
...

**WHAT YOU ALREADY KNOW ABOUT THIS ORDER (do NOT ask again)**
${knownOrderJson}

**YOUR NEXT TASK**
${nextFieldInstruction}

**RULES**
- One question per reply — never bundle multiple questions.
- When they decline something, store the exact decline string. Do NOT store null.
- Only set progress "completed" after the customer explicitly confirms the order.

**OUTPUT FORMAT — RETURN ONLY THIS JSON, NO OTHER TEXT**
{{"message":"...","current_order":{{...}},"suggestions":[...],"progress":"in_progress"}}
    `,
  },
  new MessagesPlaceholder('messages'),
]);

Maya is told: "Ask for exactly this, in this order, one at a time." When all fields are collected, the instruction flips to: "Read back the full order and ask the customer to confirm. Set progress 'completed' only after they explicitly say yes."

One gotcha worth knowing: knownOrder is serialized with braces escaped ({ → {{) before injection, because LangChain templates treat single braces as variables. Forget this and you'll get a cryptic template error the first time the order JSON is non-empty.

Step 5: Calling the LLM¶

The LLM setup uses two key settings:

// src/chats/chats.service.ts
this.llm = new ChatOllama({
  model: process.env.OLLAMA_MODEL || 'qwen2.5:7b',
  temperature: 0,     // deterministic output
  baseUrl: (process.env.OLLAMA_BASE_URL || 'http://localhost:11434').replace('/v1', ''),
  format: 'json',     // forces structured JSON — no prose, no markdown
});

temperature: 0 makes the output as deterministic as possible. format: 'json' tells Ollama to constrain the model to valid JSON output.

Note how per-request data reaches the graph node: knownOrder and sessionStart are passed through LangGraph's config.configurable, and only the current session's messages go to the model:

// src/chats/chats.service.ts
const finalState = await this.agent.invoke(
  { messages: [new HumanMessage(query)] },
  { configurable: { thread_id, knownOrder, sessionStart } },
);

// inside callModel — slice off everything before the session boundary
const formattedPrompt = await prompt.formatMessages({
  messages: state.messages.slice(sessionStart),
});

Context from a prior completed order never bleeds through.

Step 6: Parsing the Response (and Surviving Garbage)¶

LLMs don't always return clean JSON even when asked nicely. Some models wrap output in fences, others add a <think> block before answering. The extractJsonResponse helper handles all of it:

// src/chats/agent-utils.ts
export function extractJsonResponse(content: any): any {
  let text = /* normalize string | content-part array to string */;

  // Strip chain-of-thought tags (reasoning models like DeepSeek emit these)
  text = text.replace(/<think>[\s\S]*?<\/think>/gi, '').trim();

  // Try fenced JSON blocks first
  const fenced = text.match(/```(?:json)?\s*([\s\S]*?)\s*```/i);
  if (fenced?.[1]) return JSON.parse(fenced[1].trim());

  // Fall back to raw JSON
  const raw = text.match(/\{[\s\S]*\}/);
  if (raw?.[0]) return JSON.parse(raw[0]);

  throw new Error(`No JSON block found: ${text.slice(0, 200)}`);
}

And when even that fails? Don't return a 500. The service recovers with a polite retry message — the garbage is already checkpointed, and the rebuild loop in Step 3 tolerates it:

// src/chats/chats.service.ts
try {
  response = extractJsonResponse(lastMessage.content);
} catch (err) {
  console.error('Unparseable model output:', err);
  return {
    message: "Sorry, I didn't quite catch that — could you say it again?",
    current_order: knownOrder,
    suggestions: [],
    progress: 'in_progress',
  };
}

This single catch block makes the agent compatible with different Ollama models — and resilient to the bad day every model eventually has.

Step 7: Trust, but Verify¶

Here's the subtle production problem most agent tutorials skip: the model will claim values the user never said.

Two real failure modes from this app:

User types "I'd like a Latte" → model stores name: "Latte"
User starts a second order → model carries size: "Grande" over from the first one

Coffee Agent defends against both. A name is accepted only when the previous bot message was explicitly asking for one, and the value passes a deny-list check:

// src/chats/agent-utils.ts
export const INVALID_NAME_TERMS = new Set([
  ...DRINKS.map((d) => d.name.toLowerCase()),       // espresso, latte...
  ...SIZES.map((s) => s.name.toLowerCase()),        // tall, grande, venti...
  ...MILKS.map((m) => m.name.toLowerCase()),
  ...SYRUPS.map((s) => s.name.toLowerCase()),
  // ...plus ~90 English stop words: 'please', 'want', 'yes', 'hello'...
]);

export const isValidCustomerName = (name: string): boolean => {
  const n = name.trim().toLowerCase();
  return n.length >= 2 && !INVALID_NAME_TERMS.has(n);
};

Every other field is accepted only if the user actually said it in the current session:

// src/chats/agent-utils.ts
export const userMentioned = (humanMsgs: string[], val: string): boolean => {
  const v = val.trim().toLowerCase();
  if (!v) return false;
  const firstWord = escapeRegex(v.split(/\s+/)[0]);
  const wordRe = new RegExp(`\\b${firstWord}\\b`);
  return humanMsgs.some((msg) => msg.includes(v) || wordRe.test(msg));
};

The matching is deliberately forgiving — "vanilla please" validates "Vanilla Syrup" — but whole-word: a bare "2" (a quantity) cannot validate "2% Milk". That last case is a real bug this function fixed.

// src/chats/chats.service.ts — applying the rules
if (k === 'name') {
  if (prevBotAskedForName && isValidCustomerName(strV)) knownOrder.name = v;
  continue;
}
if (userMentioned(humanMsgs, strV)) knownOrder[k] = v;

Step 8: Server-Side Guardrails¶

This is the most important part of the architecture. After the LLM responds, the service applies deterministic guards. When a guard fires, it completely overrides the model's output with a reliable, hardcoded response.

Guard 1: Block premature completion. If the model marks the order completed while fields are still missing, re-ask for the first missing field:

// src/chats/chats.service.ts
if (uncollected.length > 0 && response.progress === 'completed') {
  const next = questionMap[uncollected[0]];
  const guardPayload = {
    message: next.msg,
    current_order: knownOrder,
    suggestions: next.opts,
    progress: 'in_progress',
  };
  await this.agent.updateState(threadConfig, {
    messages: [new AIMessage(JSON.stringify(guardPayload))],
  });
  return guardPayload;
}

Guard 2: Force completion when everything is ready. If all fields are collected but the model still says in_progress, flip it — and persist the completed marker so the next turn's session boundary calculation works:

if (uncollected.length === 0 && response.progress !== 'completed') {
  response.progress = 'completed';
  await this.agent.updateState(threadConfig, {
    messages: [new AIMessage(JSON.stringify({ ...response, progress: 'completed' }))],
  });
}

Guard 3: Hard name requirement. An order can never complete without a name, no matter what the model says:

if (!knownOrder.name && response.progress !== 'completed') {
  const nameQ = questionMap.name;
  const nameGuard = {
    message: nameQ.msg,           // "Welcome! May I have your name for the order?"
    current_order: { ...knownOrder },
    suggestions: nameQ.opts,
    progress: 'in_progress' as const,
  };
  await this.agent.updateState(threadConfig, {
    messages: [new AIMessage(JSON.stringify(nameGuard))],
  });
  return nameGuard;
}

The fallback questions come from buildQuestionMap() in agent-utils.ts, which derives its suggestion chips from the same menu data the prompt uses — so the guards can never offer an option the menu doesn't have.

The pattern: LLM for natural language generation, deterministic code for business rule enforcement. Use each tool for what it's good at.

Step 9: Saving the Order (With De-duplication)¶

When an order completes, it's saved via Mongoose. But the user might confirm the order and then keep chatting in the same thread — so the completion path could trigger twice. The de-dup check scans only the current session for an earlier completed marker:

// src/chats/chats.service.ts
const alreadySaved = allMessages.slice(sessionStart, -1).some((m) => {
  if (!(m instanceof AIMessage)) return false;
  try {
    return extractJsonResponse(m.content)?.progress === 'completed';
  } catch { return false; }
});

if (!alreadySaved) {
  await this.orderModel.create({
    name: knownOrder.name,
    drink: knownOrder.drink,
    size: nullStr(knownOrder.size),
    milk: nullStr(knownOrder.milk),
    syrup: nullStr(knownOrder.syrup),
    sweetener: nullStr(knownOrder.sweeteners),
    toppings: nullStr(knownOrder.toppings),
    quantity: safeQty(knownOrder.quantity),   // clamped to 1–10
  });
}

Why session-scoped? An earlier version scanned the whole thread — and a previous order's completion marker suppressed saving the new one. Scoping the scan to sessionStart fixed it. Session boundaries aren't just a prompt concern; they shape your persistence logic too.

The Order schema now carries a status field for the barista workflow:

// src/chats/schemas/order.schema.ts
export type OrderStatus = 'pending' | 'delivered';

@Schema()
export class Order {
  @Prop({ required: true }) name: string;
  @Prop({ required: true }) drink: string;
  @Prop({ default: null })  size: string;
  @Prop({ default: null })  milk: string;
  @Prop({ default: null })  syrup: string;
  @Prop({ default: null })  sweetener: string;
  @Prop({ default: null })  toppings: string;
  @Prop({ default: 1 })     quantity: number;
  @Prop({ default: 'pending' }) status: OrderStatus;
}

After saving, the service replaces the model's completion message with a deterministic summary built entirely from knownOrder — no LLM involved in the final confirmation:

Your order has been placed, Alex! ☕

Drink: Grande Latte
Milk: Oat Milk
Syrup: Vanilla Syrup
Sweetener: Classic Syrup
Toppings: Whipped Cream

Would you like to order something else?

Step 10: The API and Frontend¶

The NestJS controller exposes three endpoints:

// src/chats/chats.controller.ts
@Controller('chats')
export class ChatsController {
  @Get('orders')
  getOrders() { ... }              // Pending orders, newest first

  @Patch('orders/:id/deliver')
  deliverOrder(@Param('id') id: string) { ... }   // Soft delete: status → 'delivered'

  @Post('message/:thread')
  chatWithAgent(
    @Param('thread') threadId: string,
    @Body() { query }: SendChatDto,
  ) { ... }                        // Customer chat
}

Input is validated before it ever reaches the agent — class-validator rejects empty or oversized messages at the framework level:

// src/chats/dtos/send-chat.dto.ts
export class SendChatDto {
  @IsString()
  @IsNotEmpty()
  @MaxLength(500)
  query: string;
}

The React frontend creates or retrieves a thread ID from localStorage on load, so the conversation survives page refreshes:

// client/src/App.tsx
function getOrCreateThreadId(): string {
  const stored = localStorage.getItem('coffee_thread_id');
  if (stored) return stored;
  const id = crypto.randomUUID();
  localStorage.setItem('coffee_thread_id', id);
  return id;
}

Each agent response is structured JSON:

{
  "message": "Great choice! What milk would you like in your Latte?",
  "current_order": {
    "name": "Alex",
    "drink": "Latte",
    "size": "Grande",
    "milk": null,
    "syrup": null,
    "sweeteners": null,
    "toppings": null,
    "quantity": 1
  },
  "suggestions": ["Whole Milk", "Oat Milk", "Almond Milk", "No Milk"],
  "progress": "in_progress"
}

The frontend consumes this like any other API — it doesn't need to know anything about LLMs:

// client/src/api.ts
export async function sendMessage(threadId: string, query: string): Promise<ApiResponse> {
  const res = await fetch(`/api/chats/message/${threadId}`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ query }),
  });
  if (!res.ok) throw new Error(`Request failed: ${res.status}`);
  return res.json();
}

Bonus: Yes, You Can Unit Test an Agent App¶

"How do I test something with an LLM in the middle?" You don't test the LLM. You extract every deterministic decision into pure functions and test those.

That's exactly what agent-utils.ts is: JSON extraction, name validation, value verification, quantity clamping, and the guard question map — all pure, all covered by agent-utils.spec.ts:

// src/chats/agent-utils.spec.ts
it('strips <think> blocks from reasoning models', () => {
  expect(extractJsonResponse('<think>let me reason</think>{"a":1}')).toEqual({ a: 1 });
});

it('rejects menu vocabulary and stop words', () => {
  expect(isValidCustomerName('Latte')).toBe(false);
  expect(isValidCustomerName('oat milk')).toBe(false);
});

it('does not let short fragments validate unrelated values', () => {
  // "2" (a quantity) must not validate "2% Milk"
  expect(userMentioned(['2'], '2% Milk')).toBe(false);
});

yarn test

No mocking, no LLM calls, runs in milliseconds. The testable surface of an agent app is the deterministic layer — which is also where all your bugs will be. The LLM's prose quality you evaluate by talking to it; your business rules you verify with Jest.

The Shift: From Flowchart to Brain¶

A traditional ordering app looks something like this internally:

State: WAITING_FOR_NAME
  if userSaysName → State: WAITING_FOR_DRINK
State: WAITING_FOR_DRINK
  if userSaysDrink → State: WAITING_FOR_SIZE
  if userSaysQuestion → State: SHOW_MENU_HELP
...and so on for every combination

This is brittle. What if the user gives their name and drink in the same message? What if they ask "what's the difference between grande and venti?" mid-order? Every edge case needs a new branch.

Coffee Agent handles all of this naturally because the LLM understands intent, not just exact strings. When a user says "Actually, make it a venti" three turns in, Maya understands the correction. When they try to finish without giving a name, the guard steps in — not with an ugly error, but with a natural redirect.

The brain doesn't replace your business logic — it handles the messy human communication so your business logic doesn't have to.

Running It Yourself¶

# 1. Clone the repo
git clone https://github.com/pkhamdee/coffee-agent.git
cd coffee-agent

# 2. Install Ollama and pull the model
brew install ollama
ollama pull qwen2.5:7b

# 3. Start MongoDB
cd mongodb && docker compose up -d && cd ..

# 4. Configure environment — create .env in the project root:
# MONGO_URI=mongodb://admin:password123@localhost:27017/?retryWrites=true&w=majority
# OLLAMA_BASE_URL=http://localhost:11434/v1
# OLLAMA_MODEL=qwen2.5:7b

# 5. Start the backend
yarn install
yarn start:dev    # NestJS on :3000

# 6. Start the frontend (new terminal)
cd client
yarn install
yarn dev          # React on :5173

Open http://localhost:5173 to chat with Maya. Open http://localhost:5173/dashboard for the barista order queue. The backend fails fast with a clear error if MONGO_URI is missing — no silent half-broken startup.

Exercises to Make It Yours¶

The fastest way to learn this architecture is to break it and fix it:

Add a drink. Add "Matcha Latte" to DRINKS in menu_data.ts with supportTopping: false. Notice you change zero prompt or service code — the capability flags drive everything.
Swap the model. Set OLLAMA_MODEL=llama3.1:8b in .env and compare how well it follows the JSON format. extractJsonResponse absorbs the differences.
Break a guard. Comment out Guard 3 (the name requirement) and try ordering with "I'm Latte". Watch the bad name land in the database — then put the guard back.
Run the tests. yarn test, then add a case: should isValidCustomerName('Maya') pass? Should it?
Add a field. Add a "for here or to go?" question. You'll touch stillNeeded, buildQuestionMap, the Order schema, and the prompt — a tour of the whole pipeline.

Key Design Patterns to Take Away¶

These patterns apply to any agent application you build:

1. Inject state into every prompt, explicitly. Don't ask the LLM to remember things across turns. Put the current state directly in the system prompt every time — knownOrderJson is injected on every single request.

2. Rebuild state from history; don't trust the model. For anything that needs to be accurate, reconstruct it from your authoritative history. Coffee Agent rebuilds knownOrder by replaying messages rather than trusting what the model claims.

3. Verify before you accept. A model-claimed value only enters knownOrder if the user actually said it (userMentioned) or the bot explicitly asked for it (prevBotAskedForName). This kills hallucinated names and cross-order bleed in one move.

4. LLM for generation, code for enforcement. Use the model to write natural, friendly responses. Use deterministic guards to enforce business rules — and override the model when it violates one.

5. Track session boundaries within a thread. One thread can span multiple orders. sessionStart scopes the prompt context, the value-acceptance rules, and the save de-duplication.

6. Structure every output. The model always returns JSON, so React consumes agent responses exactly like any other API. Recover gracefully when parsing fails — a retry message, not a 500.

7. Extract pure logic and test it. Everything deterministic lives in agent-utils.ts with Jest coverage. The LLM is the only thing you can't unit test — so make it the only thing you don't.

Summary¶

Agents aren't magic, and they aren't a replacement for well-written code. They're a different way to handle the parts of an application that are hard to express as a flowchart.

What the Coffee Agent demo shows in concrete terms:

Lesson	Where it lives
An agent is a reasoning loop driven by injected state	`callModel()` + dynamic system prompt
LangGraph + MongoDB checkpoints give persistence for free	`MongoDBSaver`, compiled once in the constructor
A local 7B model handles multi-turn ordering reliably	Ollama + `qwen2.5:7b`, `temperature: 0`, `format: 'json'`
Guardrails are the most important code in the app	Guards 1–3 in `chats.service.ts`
Never accept a value the user didn't say	`userMentioned()`, `isValidCustomerName()`
The deterministic layer is fully unit-testable	`agent-utils.ts` + `agent-utils.spec.ts`
The frontend doesn't care there's an LLM	Plain JSON API, validated DTOs

The shift from sequential code to agent-driven logic isn't all-or-nothing. You can add a single agent node to an existing NestJS or Express application — handling one complex, conversational feature — while keeping everything else exactly as it was.

Pick one part of your application where users navigate a messy, multi-step interaction. Replace the flowchart with a brain. Coffee Agent is the blueprint: clone it, run it, break it, make it yours.

Questions or discussion? Connect on LinkedIn, X or reach out via email.

Discussion

Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.