Master Generative AI — Part 4: Practical Applications¶
Part 4 of the Master Generative AI: A Step-by-Step Challenge series.
Series Map:
- Part 1 → Foundation of AI & ML
- Part 2 → Working with LLMs
- Part 3 → Advanced Generative AI
- Part 4 → Practical Applications ← you are here
- Part 5 → Career & Capstone Projects
Theory meets reality in this part. We take the tools from Parts 1–3 and apply them to the domains where generative AI is already creating measurable business value — and where practitioners are most in demand in 2026.
Chapter 1: Generative AI for Code¶
Why Code Is the Killer App for LLMs¶
Code is text. LLMs trained on billions of lines of code from GitHub develop remarkable abilities:
- Autocomplete: finishing functions from a signature or comment
- Refactoring: improving existing code without changing behavior
- Bug fixing: identifying and correcting errors
- Test generation: writing unit tests from implementation
- Documentation: generating docstrings, README, API docs
- Translation: converting code from one language to another
GitHub Copilot: What It Actually Does¶
Copilot sends your current file + cursor position to an LLM, which predicts what comes next:
# You type this comment:
# Function to calculate compound interest
# Copilot suggests (Tab to accept):
def calculate_compound_interest(
principal: float,
rate: float,
time: int,
n: int = 12 # compounding frequency per year
) -> float:
"""Calculate compound interest.
Args:
principal: Initial investment amount
rate: Annual interest rate (as decimal, e.g., 0.05 for 5%)
time: Investment period in years
n: Number of times interest compounds per year
Returns:
Final amount after compound interest
"""
return principal * (1 + rate / n) ** (n * time)
Code Llama and Local Code Models¶
For private codebases where you can't send code to external APIs:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Code Llama — open source, can run locally
model_id = "codellama/CodeLlama-7b-Instruct-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.float16, device_map="auto"
)
def generate_code(instruction: str, context: str = "") -> str:
prompt = f"""[INST] {instruction}
{f'Context: {context}' if context else ''}
Provide only the code, no explanation. [/INST]"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs, max_new_tokens=500, temperature=0.1, do_sample=True
)
return tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
# Generate unit tests
code = """
def merge_sorted_lists(list1: list, list2: list) -> list:
result = []
i = j = 0
while i < len(list1) and j < len(list2):
if list1[i] <= list2[j]:
result.append(list1[i]); i += 1
else:
result.append(list2[j]); j += 1
return result + list1[i:] + list2[j:]
"""
tests = generate_code(
instruction="Write comprehensive pytest unit tests for this function",
context=code
)
print(tests)
Practical Patterns for Code AI¶
from openai import OpenAI
client = OpenAI()
def ai_code_review(code: str, language: str = "Python") -> dict:
"""AI-powered code review."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "You are a senior software engineer doing code review. "
"Be specific, actionable, and kind."
}, {
"role": "user",
"content": f"""Review this {language} code and return JSON with:
{{
"issues": [{{ "line": N, "severity": "critical|major|minor", "description": "...", "fix": "..." }}],
"score": 1-10,
"summary": "..."
}}
Code:
```{language.lower()}
{code}
```"""
}],
response_format={"type": "json_object"}
)
import json
return json.loads(response.choices[0].message.content)
def ai_generate_tests(code: str) -> str:
"""Generate unit tests for given code."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "Generate pytest tests with >90% coverage. Use parametrize for edge cases."
}, {
"role": "user",
"content": f"Write tests for:\n```python\n{code}\n```"
}]
)
return response.choices[0].message.content
def ai_explain_code(code: str, audience: str = "junior developer") -> str:
"""Explain code for a specific audience."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"Explain this code to a {audience}. "
f"Use simple language and analogies:\n\n{code}"
}]
)
return response.choices[0].message.content
Chapter 2: Generative AI for Business¶
Core Business Use Cases in 2026¶
| Use Case | Time Saved | Typical Tool |
|---|---|---|
| Meeting summarization | 2 hrs/meeting → 2 min | Whisper + GPT-4o |
| Email drafting | 20 min → 2 min | Claude/GPT-4o |
| Report generation | 4 hrs → 30 min | LLM + structured data |
| Customer support | 80% deflection | RAG chatbot |
| Contract analysis | 3 hrs → 15 min | GPT-4o with vision |
| Data extraction from docs | 1 hr/doc → 30 sec | Vision + structured output |
Document Intelligence Pipeline¶
import anthropic
import base64
import json
from pathlib import Path
client = anthropic.Anthropic()
def extract_invoice_data(pdf_path: str) -> dict:
"""Extract structured data from an invoice image/PDF."""
image_data = base64.standard_b64encode(Path(pdf_path).read_bytes()).decode()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1000,
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}},
{"type": "text", "text": """Extract all invoice data as JSON:
{
"vendor": {"name": "", "address": "", "email": ""},
"invoice_number": "",
"date": "YYYY-MM-DD",
"due_date": "YYYY-MM-DD",
"line_items": [{"description": "", "quantity": 0, "unit_price": 0, "total": 0}],
"subtotal": 0,
"tax": 0,
"total": 0,
"currency": "USD"
}
Return ONLY valid JSON, no explanation."""}
]
}]
)
return json.loads(response.content[0].text)
def summarize_meeting(transcript: str) -> dict:
"""Structure a meeting transcript into actionable summary."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1500,
messages=[{
"role": "user",
"content": f"""Summarize this meeting transcript as JSON:
{{
"tldr": "one sentence summary",
"key_decisions": ["..."],
"action_items": [{{"owner": "", "task": "", "due": "YYYY-MM-DD"}}],
"open_questions": ["..."],
"next_meeting_agenda": ["..."]
}}
Transcript:
{transcript}"""
}]
)
return json.loads(response.content[0].text)
Business Automation with AI¶
import smtplib
from email.mime.text import MIMEText
from openai import OpenAI
client = OpenAI()
class BusinessAIAssistant:
def __init__(self):
self.client = OpenAI()
def draft_professional_email(
self, context: str, tone: str = "professional", recipient: str = "colleague"
) -> str:
return self.client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "system",
"content": f"Write a {tone} email to a {recipient}. "
"Be concise. Do not add placeholders."
}, {
"role": "user", "content": context
}]
).choices[0].message.content
def analyze_customer_feedback(self, reviews: list[str]) -> dict:
combined = "\n".join(f"- {r}" for r in reviews[:50]) # limit
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"""Analyze these customer reviews. Return JSON:
{{
"overall_sentiment": "positive|negative|mixed",
"nps_estimate": 0-100,
"top_positives": ["..."],
"top_complaints": ["..."],
"suggested_improvements": ["..."],
"urgent_issues": ["..."]
}}
Reviews:
{combined}"""
}],
response_format={"type": "json_object"}
)
import json
return json.loads(response.choices[0].message.content)
def generate_report(self, data: dict, report_type: str) -> str:
return self.client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": f"You are a business analyst. Write a {report_type} report. "
"Use professional language. Include executive summary, "
"findings, and recommendations."
}, {
"role": "user",
"content": f"Generate report from this data:\n{data}"
}]
).choices[0].message.content
Chapter 3: Generative AI for Education & Research¶
AI-Powered Learning Tools¶
from openai import OpenAI
client = OpenAI()
def adaptive_tutor(topic: str, student_level: str, question: str) -> str:
"""Tutor that adapts to student level."""
return client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": f"""You are an expert tutor teaching {topic}.
Student level: {student_level}
Rules:
- Use analogies and examples appropriate for their level
- Check understanding with a follow-up question
- If they seem confused, approach from a different angle
- Celebrate progress, be encouraging"""
}, {
"role": "user", "content": question
}]
).choices[0].message.content
def generate_quiz(topic: str, num_questions: int = 5, difficulty: str = "medium") -> list:
"""Generate a quiz with answers."""
import json
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"""Create {num_questions} {difficulty} multiple-choice questions about {topic}.
Return JSON array:
[{{
"question": "...",
"options": ["A) ...", "B) ...", "C) ...", "D) ..."],
"answer": "A",
"explanation": "..."
}}]"""
}],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
def research_assistant(query: str, context_papers: list[str] = None) -> dict:
"""AI research assistant for literature analysis."""
system = """You are a research assistant. Help researchers:
- Summarize academic papers accurately
- Identify research gaps
- Suggest related work
- Explain technical concepts clearly
Always note limitations and uncertainties."""
user_content = f"Research query: {query}"
if context_papers:
user_content += f"\n\nRelevant papers:\n" + "\n---\n".join(context_papers)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user_content}
]
)
return {"answer": response.choices[0].message.content,
"tokens_used": response.usage.total_tokens}
Automating Literature Review¶
import arxiv
from sentence_transformers import SentenceTransformer
import numpy as np
def literature_review_pipeline(research_topic: str, max_papers: int = 20) -> dict:
"""Automated literature review using ArXiv + AI."""
# Step 1: Fetch recent papers
search = arxiv.Search(
query=research_topic,
max_results=max_papers,
sort_by=arxiv.SortCriterion.Relevance
)
papers = list(search.results())
# Step 2: Embed abstracts for clustering
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
abstracts = [p.summary[:500] for p in papers]
embeddings = embed_model.encode(abstracts)
# Step 3: Find most relevant to query
query_embedding = embed_model.encode([research_topic])
similarities = np.dot(embeddings, query_embedding.T).flatten()
top_indices = np.argsort(similarities)[::-1][:5]
top_papers = [papers[i] for i in top_indices]
# Step 4: AI synthesis
paper_summaries = "\n\n".join([
f"Title: {p.title}\nAuthors: {', '.join(str(a) for a in p.authors[:3])}\n"
f"Abstract: {p.summary[:300]}..."
for p in top_papers
])
client = OpenAI()
synthesis = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"""Based on these papers about '{research_topic}', provide:
1. Key themes and findings
2. Research gaps and open questions
3. Methodological approaches used
4. Most impactful papers and why
Papers:
{paper_summaries}"""
}]
).choices[0].message.content
return {
"topic": research_topic,
"papers_analyzed": len(papers),
"top_papers": [{"title": p.title, "url": p.entry_id} for p in top_papers],
"synthesis": synthesis
}
Chapter 4: Generative AI in Healthcare¶
Current Applications (2026)¶
FDA-cleared AI Applications:
- Radiology: detect tumors in CT/MRI scans (IDx-DR, Aidoc)
- Pathology: analyze biopsy slides (Paige.AI)
- Cardiology: ECG interpretation (Apple Watch, Cardiologs)
- Drug discovery: protein structure prediction (AlphaFold 3)
- Clinical notes: auto-generate from doctor-patient conversation
Emerging (Research Stage):
- Drug-drug interaction prediction
- Personalized treatment recommendations
- Rare disease diagnosis from phenotypes
- Clinical trial patient matching
Protein Structure with BioPython + AI¶
# AlphaFold 3 via API (ESMFold for open-source alternative)
import requests
def predict_protein_structure(sequence: str) -> dict:
"""Predict 3D structure of a protein from amino acid sequence."""
# ESMFold API (Meta's open-source protein folding model)
response = requests.post(
"https://api.esmatlas.com/foldSequence/v1/pdb/",
headers={"Content-Type": "application/x-www-form-urlencoded"},
data=sequence,
timeout=120
)
return {
"pdb_structure": response.text, # 3D structure in PDB format
"sequence_length": len(sequence),
"sequence": sequence
}
# Example: Insulin A-chain
insulin_a = "GIVEQCCTSICSLYQLENYCN"
result = predict_protein_structure(insulin_a)
# Returns PDB format data for visualization in PyMOL or Mol*
Medical Document Analysis¶
def analyze_medical_report(report_text: str) -> dict:
"""Extract structured information from medical reports."""
import json
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are a medical information extraction system.
Extract information accurately. Mark uncertain items with '(uncertain)'.
NEVER provide medical advice or diagnosis. Only extract what is stated."""
}, {
"role": "user",
"content": f"""Extract from this medical report:
{{
"patient_demographics": {{}},
"chief_complaint": "",
"diagnoses": ["{{"icd_code": "", "description": ""}}"],
"medications": [{{"name": "", "dose": "", "frequency": ""}}],
"lab_results": [{{"test": "", "value": "", "unit": "", "flag": "normal|high|low"}}],
"recommendations": [],
"follow_up": ""
}}
Report:
{report_text}"""
}],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
Healthcare AI Caution
AI in healthcare requires: - Regulatory compliance: FDA (US), CE marking (EU), TGA (Australia) - Clinical validation: results must be clinically validated on diverse populations - Human oversight: AI assists, never replaces clinical judgment - Audit trails: every AI decision must be logged and explainable - Data privacy: HIPAA (US), PDPA (Thailand), GDPR (EU) compliance
Chapter 5: Generative AI in Marketing¶
Content Generation at Scale¶
from openai import OpenAI
from dataclasses import dataclass
client = OpenAI()
@dataclass
class MarketingContent:
product_name: str
product_description: str
target_audience: str
tone: str = "professional"
brand_voice: str = "innovative and trustworthy"
def generate_marketing_suite(content: MarketingContent) -> dict:
"""Generate a full suite of marketing content from product info."""
context = f"""
Product: {content.product_name}
Description: {content.product_description}
Target Audience: {content.target_audience}
Tone: {content.tone}
Brand Voice: {content.brand_voice}"""
def generate(task: str) -> str:
return client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Marketing copywriter.{context}"},
{"role": "user", "content": task}
],
temperature=0.8 # more creative for marketing
).choices[0].message.content
return {
"headline": generate("Write 5 compelling headlines (max 10 words each). Return as numbered list."),
"tagline": generate("Write 3 punchy taglines (max 6 words each)."),
"email_subject": generate("Write 5 email subject lines with >30% open rate potential."),
"social_linkedin": generate("Write a LinkedIn post (200 words, professional, include CTA)."),
"social_twitter": generate("Write 3 tweets (max 280 chars each, include relevant hashtags)."),
"ad_copy_short": generate("Write a Google Ads headline (30 chars) and description (90 chars)."),
"seo_meta": generate("Write an SEO meta title (60 chars) and description (155 chars)."),
"product_description": generate("Write a compelling product description (150 words, benefits-focused)."),
}
# Usage
product = MarketingContent(
product_name="CloudAI Analytics",
product_description="Real-time business intelligence platform powered by AI",
target_audience="B2B SaaS CTOs and data teams",
tone="confident and innovative",
)
suite = generate_marketing_suite(product)
for asset, content in suite.items():
print(f"\n{'='*40}\n{asset.upper()}\n{content}")
Personalization at Scale¶
def personalized_email_campaign(
base_offer: str, customer_segments: list[dict]
) -> list[dict]:
"""Generate personalized emails for each customer segment."""
results = []
for segment in customer_segments:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "system",
"content": "Email marketing specialist. Write personalized, conversion-focused emails."
}, {
"role": "user",
"content": f"""Write a personalized marketing email.
Offer: {base_offer}
Customer segment: {segment['name']}
Demographics: {segment.get('demographics', '')}
Past behavior: {segment.get('past_behavior', '')}
Pain points: {segment.get('pain_points', '')}
Format: Subject line + Email body (150 words max). Include one clear CTA."""
}]
)
results.append({
"segment": segment["name"],
"email": response.choices[0].message.content
})
return results
# A/B testing with AI
def ab_test_copy(concept: str, variants: int = 3) -> list[str]:
"""Generate multiple copy variants for A/B testing."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"""Create {variants} distinct A/B test variants for:
{concept}
Make each meaningfully different:
- Variant A: benefit-focused
- Variant B: urgency-focused
- Variant C: social proof-focused
Return each clearly labeled."""
}]
)
return response.choices[0].message.content
Chapter 6: Building an AI Agent with Tool-Calling & APIs¶
What Is an AI Agent?¶
An agent is an LLM that can use tools (functions, APIs) to take actions in the world:
User: "What's the weather in Bangkok and should I bring an umbrella?"
Agent loop:
1. Think: "I need weather data. I have a weather tool."
2. Call tool: get_weather(city="Bangkok")
3. Observe result: {"temp": 32, "humidity": 90, "rain_chance": 70}
4. Think: "High rain chance. I should recommend an umbrella."
5. Respond: "It's 32°C with 70% chance of rain. Yes, bring an umbrella!"
Building an Agent with Tool-Calling¶
from openai import OpenAI
import json
import requests
from datetime import datetime
client = OpenAI()
# Define tools (functions the agent can call)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"country_code": {"type": "string", "description": "ISO country code e.g. TH"}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the web for current information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform mathematical calculations",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression e.g. '15 * 24 + 100'"}
},
"required": ["expression"]
}
}
},
]
# Tool implementations
def get_weather(city: str, country_code: str = "") -> dict:
"""Call real weather API."""
api_key = "YOUR_OPENWEATHER_API_KEY"
url = f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid={api_key}&units=metric"
resp = requests.get(url, timeout=5)
data = resp.json()
return {
"city": city,
"temperature": data["main"]["temp"],
"description": data["weather"][0]["description"],
"humidity": data["main"]["humidity"],
}
def calculate(expression: str) -> dict:
"""Safely evaluate math expression."""
try:
# Safe eval — only allow math operations
allowed = set("0123456789+-*/().% ")
if not all(c in allowed for c in expression):
return {"error": "Invalid characters in expression"}
result = eval(expression) # safe because we validated input
return {"expression": expression, "result": result}
except Exception as e:
return {"error": str(e)}
def search_web(query: str) -> dict:
"""Simulate web search (replace with Serper/Tavily API)."""
return {"query": query, "results": f"[Search results for: {query}]"}
TOOL_FUNCTIONS = {
"get_weather": get_weather,
"calculate": calculate,
"search_web": search_web,
}
# The agent loop
def run_agent(user_message: str, max_iterations: int = 10) -> str:
messages = [
{"role": "system", "content": f"You are a helpful assistant. Today is {datetime.now().date()}. Use tools when needed."},
{"role": "user", "content": user_message}
]
for iteration in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
choice = response.choices[0]
messages.append(choice.message) # add assistant message to history
# If no tool calls → we have our final answer
if not choice.message.tool_calls:
return choice.message.content
# Execute each tool call
for tool_call in choice.message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
print(f"[Agent] Calling {func_name}({func_args})")
result = TOOL_FUNCTIONS[func_name](**func_args)
print(f"[Agent] Result: {result}")
# Add tool result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
return "Agent reached maximum iterations without completing."
# Test the agent
print(run_agent("What's the weather in Bangkok and if the temperature is above 30°C, how many degrees above freezing is that?"))
Multi-Agent Systems¶
For complex tasks, multiple specialized agents collaborate:
from openai import OpenAI
client = OpenAI()
def researcher_agent(topic: str) -> str:
"""Specialized agent for research tasks."""
return client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "You are a research specialist. Find facts, cite sources, be accurate."
}, {"role": "user", "content": f"Research: {topic}"}]
).choices[0].message.content
def writer_agent(research: str, format: str) -> str:
"""Specialized agent for writing."""
return client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": f"You are an expert writer. Format: {format}. Be engaging and clear."
}, {"role": "user", "content": f"Write using this research:\n{research}"}]
).choices[0].message.content
def critic_agent(content: str) -> str:
"""Specialized agent for quality review."""
return client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "You are an editor. Find factual errors, logical gaps, improve clarity."
}, {"role": "user", "content": f"Review and improve:\n{content}"}]
).choices[0].message.content
def orchestrated_content_pipeline(topic: str, format: str = "blog post") -> str:
"""Orchestrate multiple agents to produce high-quality content."""
print(f"[Orchestrator] Starting pipeline for: {topic}")
print("[Orchestrator] Assigning to Researcher...")
research = researcher_agent(topic)
print("[Orchestrator] Assigning to Writer...")
draft = writer_agent(research, format)
print("[Orchestrator] Assigning to Critic...")
final = critic_agent(draft)
return final
Chapter 7: Deploying AI Models on Cloud¶
Deployment Architecture Options¶
Option A: API (Simplest, most common)
Your App → OpenAI/Anthropic/Google API → Response
Pro: Zero infrastructure, always updated
Con: Cost scales with usage, data leaves your environment
Option B: Managed AI Services
Your App → AWS Bedrock / GCP Vertex AI / Azure OpenAI → Response
Pro: Enterprise compliance, regional data residency
Con: Less model variety, vendor lock-in
Option C: Self-hosted (vLLM on GPU servers)
Your App → Your vLLM Server → Open-source model → Response
Pro: Full control, data privacy, predictable cost at scale
Con: GPU expertise required, operational overhead
AWS Bedrock Deployment¶
import boto3
import json
# AWS Bedrock — access Claude, LLaMA, Titan, and others via AWS
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
def bedrock_chat(prompt: str, model_id: str = "anthropic.claude-3-5-sonnet-20241022-v2:0") -> str:
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": [{"role": "user", "content": prompt}]
})
response = bedrock.invoke_model(
modelId=model_id,
contentType="application/json",
accept="application/json",
body=body
)
result = json.loads(response["body"].read())
return result["content"][0]["text"]
# Streaming on Bedrock
def bedrock_stream(prompt: str) -> None:
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": [{"role": "user", "content": prompt}]
})
response = bedrock.invoke_model_with_response_stream(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body
)
for event in response["body"]:
chunk = json.loads(event["chunk"]["bytes"])
if chunk["type"] == "content_block_delta":
print(chunk["delta"]["text"], end="", flush=True)
GCP Vertex AI Deployment¶
import vertexai
from vertexai.generative_models import GenerativeModel
# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
# Use Gemini models through Vertex AI
model = GenerativeModel("gemini-2.0-flash-exp")
response = model.generate_content("Explain generative AI in 3 sentences.")
print(response.text)
# Deploy your own model to Vertex AI Endpoints
from google.cloud import aiplatform
def deploy_custom_model(
model_artifact_uri: str,
model_display_name: str,
machine_type: str = "n1-standard-4"
) -> str:
"""Deploy a fine-tuned model to Vertex AI."""
aiplatform.init(project="your-project-id", location="us-central1")
model = aiplatform.Model.upload(
display_name=model_display_name,
artifact_uri=model_artifact_uri, # gs://your-bucket/model/
serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/pytorch-gpu.2-2:latest"
)
endpoint = model.deploy(
machine_type=machine_type,
accelerator_type="NVIDIA_TESLA_T4",
accelerator_count=1,
min_replica_count=1,
max_replica_count=5, # auto-scales!
)
return endpoint.resource_name
Azure OpenAI Service¶
from openai import AzureOpenAI
# Azure OpenAI — same API, enterprise compliance
azure_client = AzureOpenAI(
azure_endpoint="https://your-resource.openai.azure.com/",
api_key="YOUR_AZURE_OPENAI_KEY",
api_version="2024-02-01"
)
response = azure_client.chat.completions.create(
model="gpt-4o", # your deployment name in Azure
messages=[{"role": "user", "content": "What is generative AI?"}]
)
print(response.choices[0].message.content)
Production Deployment Checklist¶
Infrastructure:
☐ GPU node provisioned with right VRAM for chosen model
☐ Load balancer in front of multiple model servers
☐ Auto-scaling configured (scale up under load, down at idle)
☐ Health checks and readiness probes on model endpoints
☐ Model warm-up request at startup (avoid cold start latency)
Reliability:
☐ Fallback model if primary is unavailable
☐ Request timeouts and retry logic
☐ Rate limiting per user/API key
☐ Circuit breaker for downstream dependencies
Observability:
☐ Request logging (prompt, response, latency, token count)
☐ Cost tracking (tokens × price per token)
☐ Error rate alerting (>1% error rate → page on-call)
☐ Latency percentiles (p50, p95, p99 in Grafana)
Security:
☐ API keys rotated, stored in secrets manager (not in code)
☐ Input validation and sanitization
☐ Output moderation (especially for public-facing apps)
☐ Audit logs for compliance
☐ Network isolation (model servers not publicly accessible)
Summary¶
| Topic | Key Takeaway |
|---|---|
| Code AI | Copilot + local models cover 80% of dev tasks; code review + test gen are immediate wins |
| Business AI | Document intelligence + meeting summaries + report generation save hours per day |
| Education AI | Adaptive tutors + auto-quizzes + literature review are highest-impact education uses |
| Healthcare AI | AlphaFold for drug discovery; always validate clinically; HIPAA/PDPA compliance mandatory |
| Marketing AI | Personalized content at scale; A/B test AI copy variants; measure conversion, not just production |
| AI Agents | LLM + tools + a loop; agents can act autonomously but need guardrails |
| Cloud Deployment | API first; Bedrock/Vertex/Azure for enterprise; self-hosted vLLM for data privacy + cost |
Next → Part 5: Career & Capstone Projects — build your portfolio, prepare for interviews, and chart your AI career path.
Practice Challenge
Build a minimal AI agent this week:
- Give it two tools:
calculatorandget_current_date - Test with: "How many days until the new year? And what is 365 × 24?"
- Watch the agent decide which tools to call in what order
- Add a third tool:
get_weatherusing a free API - Deploy it as a simple Flask API on any cloud free tier
Questions or discussion? Connect on LinkedIn, X or reach out via email.
Discussion
Have thoughts on this post? Share them below — questions, corrections, or your own experience are all welcome.