Skip to content

AI Agent Application Demo: Putting a Brain Inside Your App

Source code: github.com/pkhamdee/coffee-agent

There's a quiet revolution happening in how we write software. For decades, we've built applications the same way: write a function, call the next function, handle each case with an if statement, repeat. The logic is explicit, deterministic, and completely predictable — a flowchart carved into code.

That model still works. But it has a hard ceiling.

When a user wants to do something that doesn't fit neatly into your flowchart — when they say something ambiguous, change their mind mid-conversation, or combine requests in ways you didn't anticipate — the rigid-logic app breaks down. You end up writing more and more special-case handling until the code becomes unmaintainable.

AI agents flip this model. Instead of programming every decision upfront, you give your application a reasoning engine — a brain — and let it figure out what to do. The application stops being a flowchart and starts being a collaborator.

This post walks through a real, runnable example: a coffee shop ordering chatbot called Coffee Agent. It's a full-stack app built with NestJS, React, LangGraph, and a local LLM running on Ollama. By the time you finish reading, you'll understand exactly what an agent is, why this architecture is powerful, and how to build one yourself.


What Is an Agent, Really?

The word "agent" gets thrown around a lot. Let's be precise.

In traditional software, control flow is hardcoded. A shopping cart app literally has if (item.inCart) { removeFromCart() } else { addToCart() }. The developer made that decision — the code just executes it.

In an agent-based application, you replace some of that hardcoded logic with a language model (LLM) that reasons about what to do next. The LLM reads the current state of the world (the conversation, the data, the context), thinks, and produces a structured response that drives the application forward.

An agent has three core components:

Component What it is In Coffee Agent
Brain The LLM that reasons Ollama running qwen2.5:7b locally
Memory State that persists across turns MongoDB checkpoints via LangGraph
Environment Context and rules the agent operates in System prompt with menu, order state, and guardrails

The key insight is this: you don't tell the agent exactly what to do — you tell it what it knows and what it should accomplish, then let it reason its way there.


The Coffee Agent: What It Does

Coffee Agent is a conversational barista. You open the app, type "I'd like something with oat milk," and an AI named Maya guides you through placing a coffee order. She asks for your name, confirms your drink, collects customizations one at a time, and when everything is in order, saves it to the database.

The app has two views:

  • Customer chat — a React UI where you talk to Maya
  • Barista dashboard (BaristaDashboard.tsx) — a staff view where completed orders appear and can be marked delivered via DELETE /chats/orders/:id

Here's what makes the architecture interesting:

  • Conversation state is fully persistent — close the tab, come back, Maya remembers everything
  • After a completed order, the same thread can start a fresh order session without losing history
  • Every customization option is drink-aware — an Espresso won't be asked about size or toppings, a Frappuccino will
  • The LLM runs entirely locally — no cloud API key required

The Tech Stack

Backend (src/)

Package Version Purpose
@nestjs/common ^10 HTTP framework
@langchain/langgraph ^0.4.8 Agent orchestration
@langchain/langgraph-checkpoint-mongodb ^0.1.1 Conversation persistence
@langchain/ollama ^0.2.4 Local LLM provider
@langchain/openai / @langchain/google-genai included Alternative providers
langchain ^0.3.33 Prompt templates, output parsers
mongoose ^8.18 MongoDB ODM
zod ^4.1.8 Schema validation

Frontend (client/)

Package Purpose
React 19 + Vite Chat UI
Tailwind CSS 4 Styling

Infrastructure

  • Ollama on localhost:11434 (local LLM)
  • MongoDB via Docker Compose (mongodb/docker-compose.yml)

Architecture: One Node, All the Intelligence

Here's the full request lifecycle before we go deep into each piece:

User types a message
React → POST /api/chats/message/:threadId
NestJS ChatsController.chatWithAgent()
   ┌──────────────────────────────────────┐
   │  chatWithAgent() in chats.service.ts  │
   │                                      │
   │  1. Connect to MongoDB               │
   │  2. Load previous state from graph   │
   │  3. Find session start boundary      │
   │  4. Rebuild order from history       │
   │  5. Compute uncollected fields       │
   │  6. Build dynamic system prompt      │
   │  7. Call Ollama LLM (JSON mode)      │
   │  8. Parse and validate response      │
   │  9. Apply server-side guards         │
   │  10. Save order if completed         │
   └──────────────────────────────────────┘
Structured JSON response → React updates chat + sidebar

Notice there are no branches in the graph. The intelligence lives entirely inside one function.


Step 1: The LangGraph Graph

src/chats/chats.service.ts defines the agent as a LangGraph state machine. The graph itself is three lines:

const graph = new StateGraph(graphState)
  .addNode('agent', (states) => callModal(states))
  .addEdge(START, 'agent')
  .addEdge('agent', END);

const app = graph.compile({ checkpointer });

One node (agent), two edges. That's it. The complexity lives inside callModal, not in the graph topology.

The checkpointer serializes the full conversation — every message, in order — to MongoDB after each turn:

const checkpointer = new MongoDBSaver({ client, dbName: database_name });

When the next request arrives, LangGraph rehydrates the exact state and continues from where it left off. Conversation persistence comes for free.


Step 2: Session Boundary Detection

Here's a design you won't find in most tutorials: the service tracks session boundaries within a single thread.

When a customer finishes one order and starts another in the same conversation, the agent needs to ignore the first order's context. The code finds the boundary by scanning backwards through message history for the last "completed" marker:

// src/chats/chats.service.ts
let sessionStart = 0;
for (let i = prevMessages.length - 1; i >= 0; i--) {
  const m = prevMessages[i];
  if (!(m instanceof AIMessage)) continue;
  try {
    if (extractJsonResponse(m.content)?.progress === 'completed') {
      sessionStart = i + 1;   // New session starts after the last completed order
      break;
    }
  } catch { /* not a structured message */ }
}

Everything before sessionStart is history. Everything from sessionStart forward is the current order in progress. The model only sees the current session's messages, so it never confuses order A details with order B.


Step 3: Rebuilding Order State from History

Rather than trusting the LLM to maintain a running order object, the server rebuilds it from scratch every turn by replaying the session's message history.

AI messages supply confirmed field values. Human messages supply explicit declines — because users often say "no milk please" in a way the model might store as null instead of "No Milk":

// src/chats/chats.service.ts
const NO_DECLINE_PATTERNS: Array<[string, string, string]> = [
  ['toppings',   'no topping',   'No Toppings'],
  ['milk',       'no milk',      'No Milk'],
  ['syrup',      'no syrup',     'No Syrup'],
  ['sweeteners', 'no sweetener', 'No Sweetener'],
];

const knownOrder: Record<string, any> = {};
for (const m of prevMessages.slice(sessionStart)) {
  if (m instanceof HumanMessage) {
    const h = String(m.content).toLowerCase();
    for (const [field, pattern, canonical] of NO_DECLINE_PATTERNS) {
      if (h.includes(pattern)) knownOrder[field] = canonical;
    }
  } else if (m instanceof AIMessage) {
    try {
      const co = extractJsonResponse(m.content)?.current_order;
      if (!co) continue;
      for (const [k, v] of Object.entries(co)) {
        if (v === null || v === 'null' || !String(v).trim()) continue;
        knownOrder[k] = v;
      }
    } catch { /* skip */ }
  }
}

The result is knownOrder — an authoritative snapshot of what has been confirmed, independent of what the model might claim.


Step 4: The Invalid Name Guard

One subtle production problem: users often lead with their drink, not their name. If someone types "I'd like a Latte," the model may incorrectly store name: "Latte". The code blocks this with a deny-list of menu vocabulary:

// src/chats/chats.service.ts
const INVALID_NAME_TERMS = new Set([
  ...DRINKS.map((d) => d.name.toLowerCase()),   // espresso, latte, cappuccino...
  'tall', 'grande', 'venti', 'trenta',
  'whole milk', 'oat milk', 'almond milk', 'no milk',
  'vanilla syrup', 'caramel syrup', 'no syrup',
  'yes', 'no', 'ok', 'sure', 'please', 'hi', 'hello',
  'want', 'would', 'like', 'need', 'order', 'drink', 'coffee',
  // ...and more common English words
]);

A name value from the model is accepted only when three conditions are all true: 1. The previous bot message was explicitly asking for a name 2. The value is at least 2 characters 3. The value is not in INVALID_NAME_TERMS


Step 5: Building the Dynamic System Prompt

This is where the "brain" gets its instructions. The system prompt is rebuilt every turn with three dynamic sections injected directly:

// src/chats/chats.service.ts
const prompt = ChatPromptTemplate.fromMessages([
  {
    role: 'system',
    content: `
You are Maya, a friendly and enthusiastic barista at a cozy café...

**AVAILABLE DRINKS AND OPTIONS**
${DRINKS.map((drink) => `- ${createDrinkItemSummary(drink)}`).join('\n')}
Sizes: ${createSizesSummary()}
Milks: ${createAvailableMilksSummary()}
Syrups: ${createSyrupsSummary()}
Sweeteners: ${createSweetenersSummary()}
Toppings: ${availableToppingsSummary()}

**WHAT YOU ALREADY KNOW ABOUT THIS ORDER (do NOT ask again)**
${knownOrderJson}

**YOUR NEXT TASK**
${nextFieldInstruction}

**OUTPUT FORMAT — RETURN ONLY THIS JSON**
{{"message":"...","current_order":{...},"suggestions":[...],"progress":"in_progress"}}
    `,
  },
  new MessagesPlaceholder('messages'),
]);

The nextFieldInstruction is computed from the drink's capability flags. Each drink in src/lib/utils/constants/menu_data.ts declares exactly what it supports:

// src/lib/utils/constants/menu_data.ts
export const DRINKS: Drink[] = [
  {
    name: 'Espresso',
    supportMilk: false, supportSweeteners: true,
    supportSyrup: true, supportTopping: false, supportSize: false,
  },
  {
    name: 'Latte',
    supportMilk: true, supportSweeteners: true,
    supportSyrup: true, supportTopping: true, supportSize: true,
  },
  {
    name: 'Cappuccino',
    supportMilk: true, supportSweeteners: true,
    supportSyrup: true, supportTopping: true, supportSize: true,
  },
  {
    name: 'Cold Brew',
    supportMilk: true, supportSweeteners: true,
    supportSyrup: true, supportTopping: false, supportSize: true,
  },
  {
    name: 'Frappuccino',
    supportMilk: true, supportSweeteners: true,
    supportSyrup: true, supportTopping: true, supportSize: true,
  },
];

The agent uses these flags to compute only the fields that still need collecting for the chosen drink, then injects them as an ordered list into the prompt. Maya is told: "Ask for exactly this, in this order, one at a time."


Step 6: Calling the LLM

The LLM call is straightforward. The service uses ChatOllama with two key settings:

// src/chats/chats.service.ts
const chat = new ChatOllama({
  model: process.env.OLLAMA_MODEL || 'qwen2.5:7b',
  temperature: 0,     // deterministic output
  baseUrl: OLLAMA_BASE_URL.replace('/v1', ''),
  format: 'json',     // forces structured JSON — no prose, no markdown
});

const result = await chat.invoke(formattedPrompt);

temperature: 0 makes the output as deterministic as possible. format: 'json' tells Ollama to instruct the underlying model to always respond with valid JSON. Note that only the current session's messages are passed to the model — not the full thread history — so context from a prior completed order never bleeds through.

The repo also includes @langchain/openai and @langchain/google-genai as dependencies, so swapping the provider is just a config change.


Step 7: Parsing the LLM Response

LLMs don't always return clean JSON even when asked nicely. Some models wrap output in fences, others add a <think> block before answering. The extractJsonResponse helper strips all of that:

// src/chats/chats.service.ts
function extractJsonResponse(content: any): any {
  let text = /* normalize content to string */;

  // Strip chain-of-thought tags (models like DeepSeek use these)
  text = text.replace(/<think>[\s\S]*?<\/think>/gi, '').trim();

  // Try fenced JSON blocks first
  const fenced = text.match(/```(?:json)?\s*([\s\S]*?)\s*```/i);
  if (fenced?.[1]) return JSON.parse(fenced[1].trim());

  // Fall back to raw JSON
  const raw = text.match(/\{[\s\S]*\}/);
  if (raw?.[0]) return JSON.parse(raw[0]);

  throw new Error(`No JSON block found: ${text.slice(0, 200)}`);
}

This makes the agent compatible with different Ollama models without changing any other code.


Step 8: Server-Side Guardrails

This is the most important part of the architecture, and the part most agent tutorials skip over.

After the LLM responds, the service applies a sequence of deterministic guards. When a guard fires, it completely overrides the model's output with a reliable, hardcoded response.

Guard 1: Block premature completion

If the model marks the order completed before all required fields are filled, the guard catches it and re-asks for the first missing field:

// src/chats/chats.service.ts
if (uncollected.length > 0 && response.progress === 'completed') {
  const next = questionMap[uncollected[0]];
  const guardPayload = {
    message: next.msg,
    current_order: knownOrder,
    suggestions: next.opts,
    progress: 'in_progress',
  };
  await app.updateState(threadConfig, {
    messages: [new AIMessage(JSON.stringify(guardPayload))],
  });
  return guardPayload;
}

Guard 2: Force completion when all fields are ready

If all fields are collected but the model still returns in_progress, the guard flips it to completed and persists the marker so the next session boundary calculation works correctly:

if (uncollected.length === 0 && response.progress !== 'completed') {
  response.progress = 'completed';
  await app.updateState(threadConfig, {
    messages: [new AIMessage(JSON.stringify({ ...response, progress: 'completed' }))],
  });
}

Guard 3: Hard name requirement

An order can never complete without a name. This guard runs regardless of other checks:

if (!knownOrder.name && response.progress !== 'completed') {
  const nameGuard = {
    message: 'Welcome! May I have your name for the order?',
    current_order: { ...knownOrder },
    suggestions: [],
    progress: 'in_progress',
  };
  await app.updateState(threadConfig, {
    messages: [new AIMessage(JSON.stringify(nameGuard))],
  });
  return nameGuard;
}

The pattern is: LLM for natural language generation, deterministic code for business rule enforcement. Use each tool for what it's good at.


Step 9: Saving the Order (With De-duplication)

When an order completes, it's saved to MongoDB via Mongoose. But there's a subtlety: because the user might confirm the order and then ask "what did I order?" in the same thread, the completion logic could be triggered twice. The code de-duplicates by scanning history:

// src/chats/chats.service.ts
let lastCompletedIdx = -1;
let hasInProgressAfter = false;
allMessages.slice(0, -1).forEach((m, i) => {
  if (!(m instanceof AIMessage)) return;
  try {
    const p = extractJsonResponse(m.content)?.progress;
    if (p === 'completed') { lastCompletedIdx = i; hasInProgressAfter = false; }
    else if (p === 'in_progress' && lastCompletedIdx >= 0) hasInProgressAfter = true;
  } catch { /* skip */ }
});
const alreadySaved = lastCompletedIdx >= 0 && !hasInProgressAfter;

if (!alreadySaved) {
  await this.orderModel.create({ ... });
}

The Order schema is a straightforward Mongoose document:

// src/chats/schemas/order.schema.ts
@Schema()
export class Order {
  @Prop({ required: true }) name: string;
  @Prop({ required: true }) drink: string;
  @Prop({ default: null })  size: string;
  @Prop({ default: null })  milk: string;
  @Prop({ default: null })  syrup: string;
  @Prop({ default: null })  sweeter: string;
  @Prop({ default: null })  toppings: string;
  @Prop({ default: 1 })     quantity: number;
}

When an order is saved, the service replaces the model's completion message with a deterministic summary built entirely from knownOrder — no LLM involved in the final confirmation:

Your order has been placed, Alex! ☕

Drink: Grande Latte
Milk: Oat Milk
Syrup: Vanilla Syrup
Sweetener: Classic Syrup
Toppings: Whipped Cream

Would you like to order something else?

Step 10: The API and Frontend

The NestJS controller exposes three endpoints:

// src/chats/chats.controller.ts
@Controller('chats')
export class ChatsController {
  @Get('orders')
  getOrders() { ... }                    // Barista dashboard feed

  @Delete('orders/:id')
  deliverOrder(@Param('id') id: string) { ... }   // Mark order delivered

  @Post('message/:thread')
  chatWithAgent(
    @Param('thread') threadId: string,
    @Body() { query }: sendChatDto,
  ) { ... }                              // Customer chat
}

The React frontend creates or retrieves a thread ID from localStorage on load, meaning the conversation survives page refreshes:

// client/src/App.tsx
function getOrCreateThreadId(): string {
  const stored = localStorage.getItem('coffee_thread_id');
  if (stored) return stored;
  const id = crypto.randomUUID();
  localStorage.setItem('coffee_thread_id', id);
  return id;
}

Each response from the agent is structured JSON:

{
  "message": "Great choice! What milk would you like in your Latte?",
  "current_order": {
    "name": "Alex",
    "drink": "Latte",
    "size": "Grande",
    "milk": null,
    "syrup": null,
    "sweeteners": null,
    "toppings": null,
    "quantity": 1
  },
  "suggestions": ["Whole Milk", "Oat Milk", "Almond Milk", "No Milk"],
  "progress": "in_progress"
}

The frontend consumes this like any other API — it doesn't need to know anything about LLMs:

// client/src/App.tsx
const handleSend = useCallback(async (text: string) => {
  const data = await sendMessage(threadId.current, text);

  setMessages((prev) => [...prev, { role: 'assistant', content: data.message, ... }]);
  setCurrentOrder(data.current_order);
  setSuggestions(data.suggestions ?? []);
  setProgress(data.progress);
}, []);

The Shift: From Flowchart to Brain

Let's step back and appreciate what this architecture achieves.

A traditional ordering app would look something like this internally:

State: WAITING_FOR_NAME
  if userSaysName → State: WAITING_FOR_DRINK
State: WAITING_FOR_DRINK
  if userSaysDrink → State: WAITING_FOR_SIZE
  if userSaysQuestion → State: SHOW_MENU_HELP
  if userSaysNothing → State: REPEAT_PROMPT
...and so on for every combination

This is brittle. What if the user gives their name and drink in the same message? What if they ask "what's the difference between grande and venti?" mid-order? What if they change their mind? Every edge case needs a new branch.

Coffee Agent handles all of this naturally because the LLM understands intent, not just exact strings. When a user says "Actually, make it a venti" three turns in, Maya understands the correction. When they ask a menu question, she answers it. When they try to finish without giving a name, the guard steps in — not with an ugly error, but with a natural redirect.

The brain doesn't replace your business logic — it handles the messy human communication parts so your business logic doesn't have to.


Running It Yourself

# 1. Clone the repo
git clone https://github.com/pkhamdee/coffee-agent.git
cd coffee-agent

# 2. Install Ollama and pull the model
brew install ollama
ollama pull qwen2.5:7b

# 3. Start MongoDB
cd mongodb && docker-compose up -d && cd ..

# 4. Configure environment
# Create .env with:
# OLLAMA_BASE_URL=http://localhost:11434/v1
# OLLAMA_MODEL=qwen2.5:7b
# MONGO_URI=mongodb://admin:password123@localhost:27017

# 5. Start the backend
yarn install
yarn start:dev    # NestJS on :3000

# 6. Start the frontend (new terminal)
cd client
yarn install
yarn dev          # React on :5173

Open http://localhost:5173 to chat with Maya. Open http://localhost:5173/dashboard to see the barista order queue.


Key Design Patterns to Take Away

These patterns from Coffee Agent apply to any agent application you build:

1. Inject state into every prompt, explicitly Don't ask the LLM to remember things across turns. Put the current state directly in the system prompt every time. knownOrderJson is injected on every single request.

2. Rebuild state from history, don't trust the model For anything that needs to be accurate, reconstruct it from your authoritative history. Coffee Agent rebuilds knownOrder by replaying messages rather than trusting what the model claims.

3. LLM for generation, code for enforcement Use the model to write natural, friendly responses. Use deterministic guards to enforce business rules. Override the model's output when it violates a rule.

4. Track session boundaries within a thread A single thread can span multiple orders. Finding the session start (sessionStart) is what makes this work cleanly without requiring a new thread per order.

5. Make your outputs structured Ask the model to always return JSON. This decouples the agent from the frontend — React consumes agent responses exactly like any other API.

6. Guard against hallucinated names The INVALID_NAME_TERMS deny-list is a real production concern. Without it, the model routinely stores drink names, size names, and common words as the customer's name.


Summary

Agents aren't magic, and they aren't a replacement for well-written code. But they are a fundamentally different way to handle the parts of an application that are hard to express as a flowchart.

Here's what the Coffee Agent demo illustrates in concrete terms:

  • An agent is a reasoning loop — it reads context, generates a response, and that response drives the application forward
  • LangGraph makes state manageable — threads, MongoDB checkpoints, and session boundaries give you observable, debuggable conversation history
  • Local LLMs are production-viable — Ollama with qwen2.5:7b runs on a laptop and handles multi-turn ordering conversations reliably
  • Guardrails are not optional — the most important code in an agent application is often the deterministic code that validates and overrides the LLM
  • The frontend doesn't care — from React's perspective, the agent is just an API that returns JSON
  • Session management matters — handling multiple orders in one thread, and tracking boundaries between them, is a real architectural concern that most demos ignore

The shift from sequential code to agent-driven logic isn't all-or-nothing. You can add a single agent node to an existing NestJS or Express application — handling one complex, conversational feature — while keeping everything else exactly as it was.

The most important thing is to start. Pick one part of your application where users navigate a messy, multi-step interaction. Replace the flowchart with a brain.

That's what Coffee Agent is: not a proof of concept, but a blueprint you can clone and run today.


Questions or discussion? Connect on LinkedIn, X or reach out via email.

Discussion

Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.